Surfacing the Academic Long Tail — Announcing new work with activity data

We’re pleased to announce that JISC has funded us to work on the SALT (Surfacing the Academic Long Tail) Project, which we’re undertaking with the University of Manchester, John Rylands University Library.

Over the next six months the SALT project will building a recommender prototype for Copac and the JRUL OPAC interface, which will be tested by the communities of users of those services.  Following on from the invaluable work undertaken at the University of Huddersfield, we’ll be working with ten years+ of aggregated and anonymised circulation data amassed by JRUL.  Our approach will be to develop an API onto that data, which in turn we’ll use to develop the recommender functionality in both services.   Obviously, we’re indebted to the previous knowledge acquired by a similar project at the University of Huddersfield and the SALT project will work closely with colleagues at Huddersfield (Dave Pattern and Graham Stone) to see what happens when we apply this concept in the research library and national library service contexts.

Our overall aim is that by working collaboratively with other institutions and Research Libraries UK, the SALT project will advance our knowledge and understanding of how best to support research in the 21st century. Libraries are a rich source of valuable information, but sometimes the sheer volume of materials they hold can be overwhelming even to the most experienced researcher — and we know that researchers’ expectation on how to discover content is shifting in an increasingly personalised digital world. We know that library users — particularly those researching niche or specialist subjects — are often seeking content based on a recommendation from a contemporary, a peer, colleagues or academic tutors. The SALT Project aims to provide libraries with the ability to provide users with that information. Similar to Amazons, ‘customers who bought this item also bought….’ the recommenders on this system will appear on a local library catalogue and on Copac and will be based on circulation data which has been gathered over the past 10 years at The University of Manchester’s internationally renowned research library.

How effective will this model prove to be for users — particularly humanities researchers users?

Here’s what we want to find out:

  • Will researchers in the field of humanities benefit from receiving book recommendations, and if so, in what ways?
  • Will the users go beyond the reading list and be exposed to rare and niche collections — will new paths of discovery be opened up?
  • Will collections in the library, previously undervalued and underused find a new appreciative audience — will the Long Tail be exposed and exploited for research?
  • Will researchers see new links in their studies, possibly in other disciplines?

We also want to consider if there are other  potential beneficiaries.  By highlighting rarer collections, valuing niche items and bringing to the surface less popular but nevertheless worthy materials, libraries will have the leverage they need to ensure the preservation of these rich materials. Can such data or services assist in decision-making around collections management? We will be consulting with Leeds University Library and the White Rose Consortium, as well as UKRR in this area.

(And finally, as part of our sustainability planning, we want to look at how scalable this approach might be for developing a shared aggregation service of circulation data for UK University Libraries.  We’re working with potential data contributors such as Cambridge University LibraryUniversity of Sussex Library, and the M25 consortium as well as RLUK to trial and provide feedback on the project outputs, with specific attention to the sustainability of an API service as a national shared service for HE/FE that supports academic excellence and drives institutional efficiencies.

Logging in to Copac: some tips

Now that you have the option to log-in to Copac to use the personalisation features, here are some tips to make logging in as easy as possible.

Typekey/Typepad:  if you have a Typekey or Typepad account, and were wondering where your login option was, worry no longer!  From the drop-down list of organisations on the login page, you need to choose ‘JISC project: SDSS (TypeKey Bridge)’.  It’s not immediately obvious, but it is the correct login option for any TypeKey users.

Navigating the list:  the list of organisations is very long, and weighted heavily towards ‘U’.  To navigate it more easily, you can jump straight to any letter by typing it on your keyboard.  You may find it even easier to enter a keyword search in the search box.  This will work for partial words as well – entering ‘bris’ will give you the options of the City of Bristol College and the University of Bristol.

Remembering your selection:  once you have found your organisation, there are options to have your selection remembered, either for that session (the default) or for a week.  You can also choose ‘do not remember’, which is especially useful if you are on a public computer.

Please contact us if you experience any problems with logging in to Copac.

New Copac interface

It’s finally here!  After months of very hard work from the Copac team, and lots of really useful input from users on the Beta trials, the new Copac interface is now live.

We have streamlined the Copac interface, and you can still search and export records without logging in to Copac. This is ideal if you want to do a quick search, and don’t need any of the additional functionality.  Users who choose not to login will still be able to use the new functionality of exporting records directly to EndNote and Zotero, and will see book and journal tables-of-contents, where available.

You now also have the option to login to Copac.  This is not compulsory, and you only need to login if you want to take advantage of the full range of new personalisation features.   These have been developed to help you to get the most out of Copac, and to assist in your workflows.

‘Search History’ records all of your searches, and includes a date/time stamp.  This allows you to keep track of your searches, and to easily re-run any search with a single click.

‘My References’ allows you to manage your marked records, and create an annotated online bibliography.

You can annotate and tag all of your searches and references.  There is no limit to how you can use this functionality:  see my post from March for some suggestions about how you might use tags and annotations.  We would love to hear how you are using them – please get in touch if you would like to share your experiences and ideas.

Users from some institutions will now have the option to see their local catalogue results appearing alongside the Copac results.  We are harvesting information from the institutions’ Z39.50 servers, and using this to create a merged results set.  If you are interested in your institution being a part of this, please get in touch.

Some people have expressed concern that the need to login means that Copac is going to be restricted to members of UK academic institutions only.  This is not the case.  We are committed to keeping Copac freely-accessible to all.  Login is required for the new features to function:  we need to be able to uniquely identify you in order to record your search history and references; and we need to know which (if any) institution you are from to show you local results.  We have tried to make logging in as easy as possible.  For members of UK academic institutions, this means that you can use your institution’s central username/password, or your ATHENS details  For our users who aren’t members of a UK academic institution, you can create a login from an identity provider: ProtectNetwork and TypePad.  These providers enable you to create a secure identity, which you can use to manage access to many internet sites.

We are very grateful to everyone who has taken the time to give us feedback on the recent Beta trials.  But we can never get enough feedback!  We’d love to hear what you think about the new Copac interface:  you can email us; speak to us on twitter; or leave comments here.

Notes on (Re)Modelling the Library Domain (JISC Workshop).

A couple of weeks ago, I attended JISC’s Modelling the Library Domain Workshop. I was asked to facilitate some sessions at the workshop, which was an interesting but slightly (let’s say) ‘hectic’ experience. Despite this, I found the day very positive. We were dealing with potentially contentious issues, but I noted real consensus around some key points. The ‘death of the OPAC’ was declared and no blood was shed as a result. Instead I largely heard murmured assent. As a community, we might have finally faced a critical juncture, and there were certainly lessons to be learned in terms of considering the future of services such as Copac, which as a web search service, in the Library Domain Model would count as national JISC service ‘Channel.’

In the morning, we were asked to interrogate what has been characterised as the three ‘realms’ of the Library Domain: Corporation, Channels, and Clients. (For more explanation of this model, see the TILE project report on the Library Domain Model). My groups were responsible for picking apart the ‘Channel’ realm definition:

The Channel: a means of delivering knowledge assets to Clients, not necessarily restricted to the holdings or the client base of any particular Corporation, Channels within this model range from local OPACs to national JISC services and ‘webscale’ services such as Amazon and Google Scholar. Operators of channel services will typically require corporate processes (e.g. a library managing its collection, an online book store managing its stock). However, there may be an increasing tendency towards separation, channels relying on the corporate services of others and vice versa (e.g. a library exposing its records to channels such as Google or Liblime, a bookshop outsourcing some of its channel services to the Amazon marketplace).

In subsequent discussion, we came up with the following key points:

  • This definition of ‘channel’ was too library-centric. We need to working on ‘decentring’ our perspective in this regard.
  • We will see an increasing uncoupling of channels from content. We won’t be pointing users to content/data but rather data/content will be pushed to users via a plethora of alternative channels
  • Users will increasingly expect this type of content delivery. Some of these channels we can predict (VLEs, Google, etc) and others we cannot. We need to learn to live with that uncertainty (for now, at least).
  • There will be an increasing number of ‘mashed’ channels – a recombining of data from different channels into new bespoke/2.0 interfaces.
  • The lines between the realms are already blurring, with users becoming corporations and channels….etc., etc.
  • We need more fundamental rethinking of the OPAC as the primary delivery channel for library data. It is simply one channel, serving specific use-cases and business process within the library domain.
  • Control. This was a big one. In this environment libraries increasingly devolve control of the channels via which their ‘clients’ use to access the data. What are the risks and opportunities to be explored around this decreasing level of control? What related business cases already exist, and what new business models need to evolve?
  • How are our current ‘traditional’ channels actually being used? How many times are librarians re-inventing the wheel when it comes to creating the channels of e-resource or subject specialist resource pages? We need to understand this in broad scale.
  • Do we understand the ways in which the channels libraries currently control and create might add value in expected and unexpected ways? There was a general sense that we know very little in this regard.

There’s a lot more to say about the day’s proceedings, but the above points give a pretty good glimpse into the general tenor of the day. I’m now interested to see what use JISC intends to make of these outputs. The ‘what next?’ question now hangs rather heavily.

Catalogues as Communities? (Some thoughts on Libraries of the Future)

At last week’s Libraries of the Future debate, Ken Chad challenged the presenters (and the audience) over the failure of libraries to aggregate and share their data.  I am very familiar with this battle-cry from Ken.  In the year+ that I’ve been managing Copac, he’s (good-naturedly) put me on the spot several times on this very issue.  Why isn’t Copac (or the UK HE/FE library community) learning from Amazon, and responding to user’s new expectations for personalisation and adaptive systems?

Of course, this is a critically important question, and one that is at the heart of the JISC TILE project, which Ken co-directs (I actually sit on the Reference Group). Ken’s  related argument is that the public sector business model (or lack thereof) is perhaps fatally flawed, and that we are probably doomed in this regard; private sector is winning already on the personalisation front, so instead of pouring public money into resource discovery ‘services’ we should instead, perhaps, let the market decide.  I am not going to address the issue of business models here – although this is a weighty issue requiring debate – but I want to come back to this issue of personalisation, 2.0, and the OPAC as a potential ‘architecture for participation.’

I fundamentally agree with the TILE project premise (borrowed from Lorcan Dempsey) that the library domain needs to be redefined as a set of processes required for people to interact with ‘stuff’.  We need to ask ourselves if the OPAC itself is a relic, an outmoded understanding of ‘public access’ or (social) interaction with digital content. As we do this, we’re creating heady visions where catalogue items or works can be enhanced with user-generated content, becoming ‘social objects’ that bring knowledge communities together.  ‘Access’ becomes less important than facilitating ‘use’ (or reuse) and the Discovery to Delivery paradigm is turned on its head.

It’s the ‘context’ of the OPAC as a site for participation that I am interested in questioning.  Can we simply ‘borrow’ from the successful models of Amazon or LibraryThing? Is the OPAC the ‘place’ or context that can best facilitate participative communities?

This might depend on how we’re defining participation, and as Owen Stephens has suggested (via Twitter chats) what the value of that participation is to the user.  In terms of Copac’s ‘My References’ live beta, we’ve implemented ‘tagging with a twist,’ where tagging is based on user search terms and saved under ‘Search History’.  The value here is fairly self-evident – this is a way for users to organise their own ‘stuff’. The tagging facility, too, can be used to self-organise, and as Tim Spalding suggested way back in 2007, this is also why tagging works for LibraryThing (and why it doesn’t work for Amazon). Tagging works well when people tag “their” stuff, but it fails when they’re asked to do it to “someone else’s” stuff. You can’t get your customers to organize your products, unless you give them a very good incentive.

But does this count as ‘community’ participation?  Right now we don’t provide the option for tags to be shared, though this is being seriously considered along the lines of a recommender function: users who saved this item, also saved which seems to be a logical next step, and potentially complimentary to Dave’s recommender work. However,  I’m much less convinced about whether HE/FE library users would want to explicitly share items through identity profiles, as at LibraryThing.  Would the LibraryThing community model translate to the models that university and college libraries might want to support the semantically dense and complex communities for learning, teaching and research?

One of the challenges for a participatory OPAC 2.0 (or any a cross-domain information discovery tool) will be the tackling of user context, and specifically the semantic context(s) in which that user is operating.  Semantic harvesting and text mining projects such as the Intute Repository Search have pinpointed the challenge of ‘ontological drift’ between disciplines and levels (terms and concepts having shifted meanings across disciplinary boundaries).  As we move into this new terrain of Library 2.0 this drift will likely become all the more evident.  Is the OPAC context too broad to facilitate the type of semantic precision to enable meaningful contribution and community-building?

Perhaps attention data, that ‘user DNA,’ will provide us with new ways to tackle the challenge.  There is risk involved, but some potential ‘quick wins’ that are of clear benefit.  Dave’s blog posts over the last week suggest that the value here might be in discovering people ‘like me’ who share the same research interests and keep borrowing books like the ones I borrow (although, if I am an academic researcher, that person might also be ‘The Competition’ — so there are degrees of risk to account for here — and this is just the tip of the ice-berg in terms of considering the cultural politics of academia and education).  Certainly the immediate value or ‘impact of serendipity’ is that it gives users new routes into content, new paths of discovery based on patterns of usage.

But what many of us find so compelling about the circulation data work is that it surfaces latent networks not just of books, but of people.  These are potential knowledge communities or what Wenger might call Communities of Practice (CoP).  Whether the OPAC can help nurture and strengthen those CoPs is another matter. Crowds, even wise ones, are not necessarily Communities of Practice.

The reimagining the library means reimagining (or discarding) the concept of the catalogue.  This might also mean rethinking the  OPAC as a context for community interaction.

—————–

[Related ‘watch this space’ footnote: We’ve already garnered some great feedback on the ‘My References’ beta we currently have up — over 80 user-surveys completed (and a good proportion of those from non-librarian users).  This feedback has been invaluable.  Of course, before we embark on too many more 2.0 developments, Copac needs to be fit-for-purpose.  In the next year we are re-engineering Copac, moving to new hardware, restructuring the database,  improving the speed and search precision, and developing additional (much-needed) de-duplication algorithms.  We’re also going to be undertaking a complete  overhaul of the interface (and I’m pleased to say that Dave Pattern is going to be assisting us in this aspect). In addition, as Mimas is collaborating on the TILE project through Copac, we’re going to look at how we can exploit what Dave’s done with the Huddersfield circulation data (and hopefully help bring other libraries on board).]

Supporting researchers

I have recently attended two of the NoWAL/SCONUL Working Group on Information Literacy Workshops:  one thing about writing for publication, a workshop for library staff supporting researchers, led by Moira Bent and Pat Gannon-Leary; and developments in scholarly communication, led by Bill Hubbard.

At first glance, these workshops may not seem to have much to do with Copac:  after all, the first one even specifies that it is a ‘workshop for library staff’ and, as you may know, Copac isn’t based in a library.  (We’re based in a lovely office, with the Archives Hub team, and daffodils outside the window to distract us.)  However, Copac is all about supporting researchers, with our roots as the OPAC for the Consortium of Research Libraries (CURL), now Research Libraries UK (RLUK).  One of RLUK’s values, as set out on their website, is to ‘work with the research community to promote excellence in support of current research and anticipate future needs’.  This is what we aim to do at Copac, and I got some good ideas for how to do it from these workshops.

Moira and Pat led an interesting discussion about ‘what is research?’, before introducing us to their model of the ‘seven ages of research’ (see slides 8-11).  This was particularly interesting for me, as we’ve recently been conducting some stakeholder analysis, and while we ended up with 5 divisions of librarians with different needs/priorities, we only had one for researchers.  If we are to fully consider and meet the needs of all our users, and ensure that we are communicating with them effectively, then we need to consider the differences highlighted by this model.

Bill Hubbard’s workshop on ‘developments in scholarly communication’ concentrated mainly (and unsurprisingly, given Bill’s role as manager of SHERPA) on Open Access and repositories.  A very timely workshop, following the publication of the much-talked about Houghton report, and one that you might think would be better attended by one of my colleagues from Jorum or Intute repository search.  But it is important that Copac interacts with the OA landscape as well.  Bill returned to the theme of differences between researchers.  This time, it was differences of research methodologies between disciplines:  to crudely condense Bill’s example, economists love pre-prints and working papers, biomedical scientists won’t touch them with the proverbial bargepole.  This, of course, has implications for the types of material that will be appearing in repositories.  It also has implications for how Copac can best serve the needs of these researchers.

So, from our stakeholder analysis which had undergrads, postgrads, and academic researchers all in one nice little box, it now appears that we have to look at not only the career stage of the researcher, but their discipline as well.  Can we do this?  Well, we’re getting closer…  The new Copac Beta (open to members of UK Access Management Federation institutions) is our first step towards a personalised Copac – and the more personalisation we enable, the better able we are to meet the needs of a wide range of users.  It’s still early days, but we’re asking for feedback to find out what you think of the new features, and suggestions for further developments or improvements.

New features of the Copac Beta Interface

With the new Copac interface, we wanted to make the Search History and Marked List (now re-named My References) more useful. Previously, these features were session based — that is, if you re-started your web browser, your search history and saved records were lost. For us to be able to retain that data over multiple sessions, we need to know who our users are. Hence, for Copac Beta we are forcing you to login.

The advantage of logging in is that you can use Copac Beta from multiple machines at different times and still have access to the searches and references you saved yesterday or last week – or even last year.  Unfortunately, log-in is currently restricted to members of UK Access Federation institutions (most UK HE and FE institutions, and some companies), but don’t worry – there will always be a free version of Copac open to everyone, and we will be widening the log-in scope in the future.

You can tag your searches and references and use a tag cloud to see those items tagged with a particular tag. We are automatically tagging your saved searches and references with your search terms, and you can remove these automatic tags, and add your own.  These tags are then added to your tag cloud, so that you can easily navigate your saved records through tags which are meaningful to you.  Why would you want to delete the automatically generated tags?  Well, records are tagged with all of your search terms so, if you limit your search to ‘journals and other periodicals’, the tags for records from that set will include ‘journals’ ‘other’ and ‘periodicals’.  If you find these confusing, you can just delete them, and have only tags that have meaning for you.

You can also add notes to any of your references – perhaps to remind yourself that you have ordered the item through inter-library loan, and when you should go and pick it up, or perhaps to make comments about how useful you found the item.  This ‘My References’ section was developed as part of the JISC-funded project Discovery to Delivery at Edina and Mimas (D2D@E&M) as a Reusable Marked List workpackage.

You can also edit the bibliographic details of the item.  These edited details are only visible to you, so you don’t have to worry about making any changes.  You could use this to correct a typo or misspelling in the record, or add details that are not visible in the short record display, such as information about illustrations or pagination.

The search history feature allows you to re-run any previous search with a single click, from any screen.  This could be especially useful for anyone who is doing demos, as not only do you know that the search will return results, but it saves you from the jelly fingers that haunt the even the most proficient of typists when in front of an audience.  The date and time of previous searches are recorded, so that you can see what you have searched for and when.  This could be useful for tracking the progress of a project over time, or showing at a glance what effect refining a search has on the number of results.

Many journal records now contain the latest table-of-contents.  Clicking on an article title will take you through to the Zetoc record for that article, and from there you can use the Open-URL resolver to link directly to full-text (if your institution has access), or order the article through your institution or directly from the British Library.  The table-of-contents allows you to get an idea of the scope of the journal, and whether it will be of interest to you, without going to another website. This makes it easier to avoid wasted travel or unnecessary inter-library loan requests.

We’d love to know what you think of these new features – and any suggestions you might have for new ones!  Once you’ve used the new features, please fill-in our questionnaire, to help us learn what we’re doing right, and what you’d like to see changed.  As thanks for your feedback, there’s a £35 Amazon voucher up for grabs for one lucky respondent.  The survey has 9 questions, and shouldn’t take more than 10 minutes of your time.  Of course, you can always give us additional feedback through the comments on this blog, by emailing copac@mimas.ac.uk, by phone or post, or Twitter.  But we’d really like you to do the survey as well :)

Perspectives on Goldmining.

Last Friday, Shirley and I headed down to London for the TiLE workshop: ‘”Sitting on a gold mine” — Improving Provision and Services for Learners by Aggregating and Using ‘Learner Behaviour Data.’ The aim of the workship was to take a ‘blue skies’ (but also practical) view of how usage data can be aggregated to improve resource discovery services on a local and national (and potentially global) level. Chris Keene from the University of Sussex library has written a really useful and comprehensive post about the proceedings (I had no idea he was ferverishly live blogging across the table from me — but thanks, Chris!)

I was invited to present a ‘Sector Perspective’ on the issue, and specifically the ‘Pain Points’ identifed around ‘Creating Context’ and ‘Enabling Contribution.’ The TiLE project suggests a lofty vision where, with the sufficient amount of context data about a user (derived from goldmines such as attention data pools and profile data stored within VLEs, library service databases, institional profiles — you know, simple enough;-) services could become much more Amazon-like.  OPACs could suggest to users, ‘First Year History Students who used this textbook, also highly rated this textbook…’ and such. The OPAC is thus transformed from relic of the past, to a dynamic online space enabling robust ‘architectures of participation.’

This view is very appealing, and certainly at Copac we’re doing our part to really interrogate how we can support *effective* adaptive personalisation. Nonetheless, as a former researcher and teacher, I’ve always had my doubts as to whether the Library catalogue per se, is the right ‘place’ for this type of activity.

We might be able to ‘enable contribution’ technically, but will it make a difference? An area that perhaps most urgently needs attention is research on the social component and drivers for contributing user-generated content.  As the TiLE project has identified, the ‘goldmine’ here to galvanise such usage is ‘context’ or usage data. But is it enough, especially in the context of specialised research?

As an example of the potential ‘cultural issues’ that might emerge, the TiLE project suggests the case of the questionably nefarious tag ‘wkd bk m8’ which is submitted as a tag for a record. They ask, “Is this a low-quality contribution, or does it signal something useful to other users, particularly to users who are similar to the contributor?”

I’d tend to agree the latter, but would also say that this is just the tip of the iceberg when it comes to rhetorical context. For example, consider the user-generated content that might arise around contentious works around the ‘State of Israel.’ The fact that Wikipedia has multiple differing and ‘sparring’ entries around this is a good indicator of the complexity that emerges. I would say that this is incredibly rich complexity, but on a practical level potentially very difficult for users to negotiate. Which UGC derived ‘context’ is relevant for differing users? Will our user model be granular or precise enough to adjust accordingly?

One of the challenges of accommodating a system-wide model is the tackling of semantic context. Right now, for instance, Mimas and EDINA have been tasked to come up with a demonstrator for a tag recommender that could be implemented across JISC services. This seems like a relatively simple proposition, but as soon as we start thinking about semantic context, we are immediately confronted with the question of which concept models or ontologies do we draw from?

Semantic harvesting and text mining projects such as the Intute Repository Search have pinpointed the challenge of ‘ontological drift’ between disciplines and levels. As we move into this new terrain of Library 2.0 this drift will likely become all the more evident.

Is the OPAC too generic to facilitate the type of semantic precision to enable meaningful contribution? I have a hunch it is, as did other participants when we broke out into discussion sessions.

But perhaps the goldmine of context data, that ‘user DNA,’ will provide us with new ways to tackle the challenge, and there was also a general sense that we needed to forge forward on this issue — try things out and experiment with attention data.  A service that gathers that aggregates both user-generated and attention/context data would be of tremendous benefit, and Copac (and other like services) can potentially move to a model where adaptive personalisation is supported.  Indeed, Copac as a system-wide service has a great potential as an aggregator in this regard.

There is risk involved around these issues, but there are some potential ‘quick wins’ that are of clear immediate benefit. Another speaker on Friday was Dave Pattern, who within a few minutes of ‘beaming to us live via video from Huddersfield’ had released the University of Huddersfield’s book usage data (check it out).

This is one goldmine we’re only too happy to dig into, and we’re looking forward to collaborating with Dave in the next year to find ways to exploit and further his work in a National context.  We want to implement recommender functions in Copac, but also (more importantly) working at Mimas to develop a system for the store and share of usage data from multiple UK libraries (any early volunteers?!)  The idea is that this data can also be reused to improve services on a local level.   We’re just at the proposal stage in this whole process, but we feel very motivated, and the energy of the TiLE project workshop has only motivated us more.

Of Circulation Data and Goldmines…

If you’d told me a bit more than a year ago that I’d be getting all excited about the radical potential of library circulation data, well…

This afternoon we had an interesting chat with Dave Pattern from the University of Huddersfield (he of Opac 2.0 and ‘users who borrowed this also borrowed…’ fame).  We’re hoping to collaborate with Dave to see how his important work can be taken forward on a national level.  Dave is about to release the Huddersfield circulation data (anonymised and aggregated) to the community and he’s hoping it will trigger some debate and ideas for developments.   This certainly is a real opportunity for people in our field.  On our end, we’d like to figure out how we could develop a similar feature for Copac, but also look at how to bring more libraries into the mix — contributing more data so those ‘recommendations’ are more effective.

Dave and I both sit on the TILE reference group, and there has been some important work going on in that project about the potential ‘goldmine’ of attention data we’re all sitting on at institutions and data centres.  TILE recommendations suggest the development of an attention-data store service.  Frankly, the sheer scale of this type of all encompassing undertaking gives me headpsin, but a service for the storage and open share of circulation data less so.  In fact, JISC has also recently tasked Mimas and EDINA to propose work around ‘Personalised Search and Recommendation Engines,’ so there’s real scope to think carefully about what such a service might look like.

Goldmine indeed — I’m speaking (from my ‘sector perspective’) at the TILE meeting next week.  The focus of the meeting is to look at how we can improve services for learners by aggregating and using learning behaviour data.  For our part, I am keen to see where this work with circulation and attention data can take us, and I’m looking forward to putting some thoughts together on this score for the meeting.

Spooky Personalisation (should we be afraid?)

Last Thursday members from the D2D team met up with people from the DPIE 1 and DPIE 2 projects, as well as Paul Walk (representing the IE demonstrator project). The aim was to talk ‘personalisation’ developments for the JISC IE. It’s impossible to cover the entire scope of discussion here (we were at it most of the day). As you might predict, it was a day of heated but engaging debate around a topic that is technically and socially complex. As we think about the strategic future of services and cross-service development, and there are serious questions marks over which direction we’re headed in terms of personalisation (and, of course, if it’s even possible to talk of ‘a’ direction).

The key practical aim of the meeting was to share the personalisation aspects of D2D project work, and also to discuss the recommendations of the two DPIE reports. The D2D work includes some development of personalisation components for Copac, components we are referring to cautiously as ‘lightweight’ for now. One way in which we plan to ‘personalise’ the service for users is by offering a ‘My Local Library’ cross-search, achieved (we hope — we’re very much in early phases here) via a combination of authentication and IP recognition used to identify users’ geographical location, and then a cross-search of local institutional library holdings data via Z39.50 targets.

In addition, by next the middle of next year, Copac users will be able to save marked lists of records and export them into other 2.0 environments via an atom feed (I’ll let Ashley write the more technical post on that development). Further down the line (i.e. beyond the next six months) we are interested in providing tools for users to annotate, bookmark and tag these records, but we also want to make sure that any such developments are not made in isolation and are ‘Copac’-centric — there’s a lot to explore here, obviously.

In and of themselves, these developments are not especially complex — the latter is an example of personalisation via ‘customisation’ (to use JISC definitions) where users explicitly customise content according to their own preferences. What I am especially interested in, however, is how saved lists (‘My Bibliography?’) could be used to potentially support adaptive personalisation (this is what Max Hammond, co-author of the DPIE 2 report wryly referred to at the meeting as ‘spooky’).

Dave Pattern’s experiment with using circulation data to ‘recommend’ items to University of Huddersfield library users is well known, and I hope the first step towards some potentially very interesting UK developments. On this end, we’re interested in knowing if there is anything similar to be gleaned from saved personal lists — ‘users with this item in their saved lists also have…’ (or something along those lines). This terrain is very much untested, and one of the critical issues, of course, is uptake. Amazon’s recommender function is effective due to the sheer number of users (effective *some* of the time, that is — we all have ‘off-base’ Amazon recommendations stories to tell, I admit). And this is just one small example of how adaptive personalisation of a service like Copac (or other JISC IE services) might work — there are also opportunities around capturing attention data, for instance.

The DPIE 2 report urges extreme caution in this regard. It raises some very pointed questions about how JISC and its services should approach adaptive personalisation. Too often, the authors warn, ‘personalisation’ is established as a specific goal, with the assumption that ‘personalisation’ is intrinsically valuable. In this context, change is technology rather than user driven, which is fine for experimental and demonstrator work, but high-risk for established services with a strong likelihood for failure. They question how helpful definitions of Personalisation put forth by JISC are in carrying forward a development agenda (Customisation; Adaptive Personalisation based on Data held elsewhere (APOD); Adaptive Personalisation based on User Activity (APUA). This definition “provides a mix of concepts from data capture to functionality, rather than setting out the logical link between a source of data about a user, a model of that user, and an approach to providing the user with a personal service” (17). Also missing is a robust benefits mapping process — “there is little analysis of the benefits of personalisation, beyond an assertion that it improves the user experience” (20). The report concludes:

Complex developments of “personalisation services” and similar should not be a current
priority for JISC. It seems unlikely that an external personalisation service will be able to
generate a user model which is detailed enough to be of genuine use in personalising
content; user preferences are probably not broadly applicable beyond the specific resource
for which they are set, and user behaviour is difficult to understand without deep
understanding of the resource being used. Attempting to develop user models which are
sufficiently generic to be of use to several services, but sufficiently detailed to facilitate useful
functionality is likely to be a challenging undertaking, with a high risk of failure. (26)

These are somewhat sobering thoughts, especially in a climate of personalisation and 2.0 fervour, but overall the report is useful in considering how to tread the next steps in development activity. Key for us is this issue of the user model — can we (Copac? SUNCAT? JISC?) develop one that is likely to be of use to several services? My hunch right now is ‘no.’ We know very little concrete about researchers’ behaviour and how they might benefit from such tools (interestingly, both DPIE 2 reports focused on benefits for undergraduate students, when most of the services in question are largely used by researchers). About humanities researchers, we know even less (much of the interesting work around online ‘Collaboratories’ centres on the STEM disciplines). Apparently JISC is about to commission some investigative work with researcher-users, and here at Mimas a team is about to undertake some focus group work with humanities researchers to determine how personalisation tools for services like Copac, Intute and the Archives Hub could (or could not) deliver specific benefits to their work. I’m sure this research will prove very useful.

We’re urged to ‘proceed with caution,’ but we proceed nonetheless. At Copac we’re taking a long hard look at what a personalised service might look like, and accepted that some risk-taking is likely forecast for the future. I’m very interested to know other’s opinions on a possible recommender function for Copac — at what level could such a tool prove useful, and when might it possibly be obstructive? Personally, I have used the ‘People who have bought also bought’ feature in Amazon quite extensively as a useful search tool. I am less likely to take up the direct recommendations that Amazon pitches at me through my ‘personalised’ Amazon home page, however. (This comes, in part, from making purchases for a six year old boy. If only I could toggle between ‘mummy’ and ‘professional’ profiles…. now there’s a radical thought).