User Feedback Results – Super 8

In an effort to find the magic number the SALT team opened its testing labs again this week.  Another 6 University of Manchester post graduate students spent the afternoon interrogating the Copac and John Rylands library catalogues to evaluate the recommendations thrown back by the SALT API.

With searches ranging from ’The Archaeology of Islam in Sub Saharan Africa’ to ‘Volunteering and Society: Principles and Practice’ no aspect of the Arts and Humanities was left unturned, or at least it felt that way.  We tried to find students with diverse interests within Arts and Humanities to test the recommendations from as many angles as possible.  Using the same format as the previous groups (documented in our earlier blog post ‘What do users think of the SALT recommender?), the library users were asked to complete an evaluation of the recommendations they were given.  Previously the users tested SALT when the threshold was set at 3(that is 3 people borrowed the book which therefore made it eligible to be thrown back as a recommendation), however we felt that the results could be improved.  Previously, although 77.5% found at least one recommendation useful, too many recommendations were rated as ’not that useful’,(see the charts in ‘What do users think of the SALT recommender?’).

This time, we set the threshold at 15 in the John Rylands library catalogue and 8 in Copac.  Like the LIDP team at Huddersfield, (http://library.hud.ac.uk/blogs/projects/lidp/2011/08/30/focus-group-analysis/), we have a lot of data to work with now, and we’d like to spend some more time interrogating the results to find out whether clear patterns emerge.  Although, our initial analysis has also raised some further questions, it’s also revealed some interesting and encouraging results.  Here are the highlights of what we found out.

The Results

On initial inspection the JRUL with its threshold of 15 improved on previous results;

Do any of the recommendations look useful:

92.3 % of the searches returned at least one item the user thought was useful, however when the user was asked if they would borrow at least one item only 56.2% answered that they would.

When asked, a lot of the users stated that they knew the book and so wouldn’t need to borrow it again, or that although the book was useful, their area of research was so niche that it wasn’t specifically useful to them but they would deem it as ‘useful’ to others in their field.

One of the key factors which came up in the discussions with users was the year that the book had been published. The majority of researchers are in need of up to date material, many preferring the use of journals rather than monographs, and this was taken into account when deciding whether a book is worth borrowing. Many users wouldn’t borrow anything more than 10 years old;

‘Three of the recommendations are ‘out of date’ 1957, 1961, 1964 as such I would immediately discount them from my search’ 30/08/11 University of Manchester, Postgraduate, Arts and Humanities, SALT testing group.

So the book could be a key text, and ‘useful’ but it wouldn’t necessarily be borrowed.  Quite often, one user explained, rather than reading a key text she would search for journal articles about the key text, to get up to date discussion and analysis about it. This has an impact on our hypothesis which is to discover the long tail. Quite often the long tail that is discovered includes older texts, which some users discount.

Copac, with a threshold of 8 was also tested. Results here were encouraging;

Do any of the recommendations look useful;

Admittedly further tests would need to be done on both thresholds as the number of searches conducted (25) do not give enough results to draw concrete conclusions from but it does seem as if the results are vastly improved on increase of the threshold.

No concerns about privacy

The issue of privacy was raised again. Many of the postgraduate students are studying niche areas and seemed to understand how this could affect them should the recommendations be attributed back to them. However, as much as they were concerned about their research being followed, they were also keen to use the tool themselves and so their concerns were outweighed by the perceived benefits. As a group they agreed that a borrowing rate of 5 would offer them enough protection whilst still returning interesting results. The group had no concerns about the way in which the data was being used and indeed trusted the libraries to collect this data and use it in such a productive way.

‘It’s not as if it is being used for commercial gain, then what is the issue?’ 30/08/11 University of Manchester, Postgraduate, Arts and Humanities, SALT testing group.

Unanimous support for the recommender

The most encouraging outcome from the group was the uniform support for the book recommender. Every person in the group agreed that the principle of the book recommender was a good one, and they gave their resolute approval that their data was collected and used in a positive way.

All of them would use the book recommender if it was available. Indeed one researcher asked, ‘can we have it now?’

Janine Rigby and Lisa Charnock 31/08/11

Final blog post

In this final post I’m going to sum up what this project has produced, potential next steps, key lessons learned, and what we’d pass on to others working in this area.

In the last five months, the SALT project has produced a number of outputs:

  1.  Data extraction recipe: http://salt11.wordpress.com/recipe-data-extraction-from-talis/
  2.  Details on how the algorithm can support recommendations (courtesy Dave Pattern): http://www.daveyp.com/blog/archives/1453
  3. Technical processes documentation for processing the data and supporting the recommender API (though the API itself is not yet published): http://salt11.wordpress.com/technical-processes/
  4. An open licensing statement from JRUL which means the data can be made available for reuse (we’ve yet to determine how to make this happen, given the size of the dataset; and we also need to explore whether CC-BY is the most appropriate license going forward): http://salt11.wordpress.com/2011/07/26/agreeing-licensing-of-data/
  5. A trial recommender functionality in the live Copac prototype: http://salt11.files.wordpress.com/2011/07/copac_recommender.jpg
  6. A recommender function the JRUL library search interface prototype: http://salt11.files.wordpress.com/2011/08/salt_jrul.jpg
  7. User testing instruments:SALT Postgraduate User Discussion Guide  SALT user response sheet and results
  8. Feedback from collections managers & potential data contributors helping us consider weaknesses and opportunities, as well as possible sustainable next steps.

 

Next steps:

There are a number of steps that can be taken as a result of this project – some imminent ‘quick wins’ which we plan to take on after the official end, and then others that are ‘bigger’ than this project.

What we plan to do next anyway:

  • Adjust the threshold to a higher level (using the ‘usefulness’ benchmark given to us as users as a basis) so as to suppress some of the more off-base recommendations our users were bemused by.
  • Implement the recommender in the JRUL library search interface
  • Once the threshold has been reset, consider implementing the recommender as an option feature in the new Copac interface. We’d really like to, but we’d need to assess if the results are too JRUL-centric.
  • Work with JRUL to determine most appropriate mechanisms for hosting the data and supporting the API in the longer term (decisions here are dependent on how, if at all, we continue with this work from a Shared Services perspective)
  • Work with JRUL to assess the impact of this in the longer term (on user satisfaction, and on borrowing behaviour)

The Big Picture (what else we’d like to see happen):

1.       Aggregate more data. Combine the normalised data from JRUL with processed data from additional libraries that represent a wider range of institutions, including learning and teaching. Our hunch is that only a few more would make the critical difference in ironing out some of the skewed results we get from focusing on one data set (i.e. results skewed to JRUL course listings)

2.  Assess longer term impact. Longer-term analysis of the impact of the recommender functionality on JRUL user satisfaction and borrowing behaviour.  Is there, as with Huddersfield, more borrowing from ‘across the shelf’? Is our original hypothesis borne out?

3.  Requirements and costs gathering for a shared service. Establish the requirements and potential costs for a shared service to support processing, aggregation, and sharing of activity data via an API.  Based on this project, we have a fair idea of what those requirements might be, but our experience with JRUL indicates that such provision need to adequately support the handling and processing of large quantities of data.  How much FTE, processing power, and storage would we need if we scaled to handling more libraries? Part of this requirements gathering exercise would involve identifying additional contributing libraries, and the size of their data.

4.       Experiment with different UI designs and algorithm thresholds to support different use cases. For example, undergraduate users vs ‘advanced’ researcher users might benefit from the thresholds being set differently; in addition, there are users who want to see items held elsewhere and how to get them vs those who don’t. Some libraries will be keen to manage user expectations if they are ‘finding’ stock that’s not held at the home institution.

5.       Establish more recipes to simplify data extraction from the more common LMS’s beyond Talis (Horizon, ExLibris Voyager, and Innovative).

6.       Investigate how local activity data can help collections managers identify collection strengths and recognise items that should be retained because of association with valued collections. We thought about this as a form of “stock management by association.”  Librarians might treat some long-tail items (e.g. items with limited borrowing) with caution if they were aware of links/associations to other collections (although there is also the caveat that this wouldn’t be possible with local activity data reports in isolation)

 7.       More ambitiously, investigate how nationally aggregated activity data could support activities such as stock weeding by revealing collection strengths or gaps and allowing librarians to cross check against other collections nationally. This could also inform the number of copies a library should buy, and which books from reading lists are required in multiple copies.

8.       Learning and teaching support. Explore the relationship between recommended lists and reading lists, and how it can be used as a tool to support academic teaching staff.

9.       Communicate the benefits to decision-makers.  If work were to continue along these lines, then a recommendation that has come out strongly from our collaborators is the need to accompany any development activity with a targeted communications plan, which continually articulates the benefits of utilising activity data to support search to decision-makers within libraries. While within our community a significant amount of momentum is building in this area, our meetings with librarians indicates that the ‘why should I care?’ and more to the point ‘why should I make this a priority?’ questions are not adequately answered. In a nutshell, ‘leveraging activity data’ can easily fall down or off the priority lists of most library managers.  It would be particularly useful to tie these benefits to the strategic aims and objectives of University libraries as a means to get such work embedded in annual operational planning.

What can other institutions do to benefit from our work?

  1. For those using the Talis LMS (and with a few years of data stored), institutions can extract data, and create their own API to pull in as a recommender function using these recipes.
  2. Institutions can benefit from the work we did with users to understand their perceptions of the function, and can be assured that students (undergraduates and postgraduates) can see the immediate benefit (as long as we get rid of some of the odd stuff by setting the threshold higher)
  3. Use the findings of this project to support a business case for this work to their colleagues

How can they go about this?

  1. Assess the quality and quantity of the data stored in your LMS to determine if there’s potential there. For this project (and for the simple recommender based on ‘people who borrowed) you only need data that ties unique individuals to borrowed items (see more from Andy Land on the data extraction process and how anonymisation is handled here: http://salt11.wordpress.com/recipe-data-extraction-from-talis/).
  1. To understand how the recommender algorithm works, see this post Dave Pattern wrote for us: http://www.daveyp.com/blog/archives/1453
  1. To follow our steps in terms of data format, loading, processing, and setting up an API see Dave Chaplin’s explanation: http://salt11.wordpress.com/technical-processes/
  1. To conduct user-testing and focus groups to assess the recommender, feel free to draw from our SALT Postgraduate User Discussion Guide and SALT user response sheet.

Our most significant lessons:

  1. A lower threshold may throw up ‘long tail’ items, but they are likely to not be deemed relevant or useful by users (although they might be seen as ‘interesting’ and something they might look into further). Set a threshold of ten or so, as the University of Hudderfield has, and the quality of recommendations is relatively sound.
  2. Concerns over anonymisation and data privacy are not remotely shared by the users we spoke to.  While we might question this response as potentially naive, this does indicate that users trust libraries to handle their data in a way that protects them and also benefits them.
  3. You don’t necessarily need a significant backlog of data to make this work locally. Yes, we had ten years worth from JRUL, which turned out to be a vast amount of data to crunch.  But interestingly in our testing phases when we worked with only 5 weeks of data, the recommendations were remarkably good.  Of course, whether this is true elsewhere, depends on the nature and size of the institution. But it’s certainly worth investigating.
  4. If the API is to work on the shared service level, then we need more (but potentially not many more) representative libraries to aggregate data from in order to ensure that recommendations aren’t skewed to represent one institution’s holdings, course listings or niche research interests, and can support different use cases (i.e. learning and teaching).

Lessons learned from the user evaluation perspective (or can we define the ‘long tail’?)

The key lesson we’ve learned during this project is that the assumptions behind the hypothesis of this project need to be reconsidered, as in this context the ‘long tail’ is complex and difficult to measure. Firstly how do we evaluate what is ‘long tail’ from a user perspective? We may draw a line in the sand in terms of number of times an item has been borrowed, but this doesn’t necessarily translate into individual or community contexts. Most of this project was taken up with processing the data and creating the API and UI; if we’d had a bit more time we could have spent more resource dealing with these questions as they arose during testing.

The focus groups highlight how diverse and unique each researcher and what they are researching is. We chose humanities  postgrads, PhD’s and masters level, but in this group alone we have a huge range of topic areas, from the incredibly niche to the rather more popular. Therefore we had some respondents who found the niche searches fruitful and others who found nothing, because their research area is so niche, hardly any material they don’t already know about doesn’t exist. In addition, when long tail is revealed, some researchers find it outdated or irrelevant. This is why it isn’t borrowed that often. So is there any merit in bringing it to the attention of the research community?

Further more in-depth testing in this area needs to be done in order to find answers to some of these problems.  The testing for this project asked the respondents to rate their searches and pick out some of the more interesting texts. But we need to sit with fewer researchers and broaden the discussions. What is relevant? How do you guage it as relevant? Some of the respondents said the books were not relevant but more said they would borrow them, so where does this discrepancy come from? Perhaps ‘relevant’ is not the correct term, can the long tail of discovery produce new perspectives, interesting associations perhaps previously not thought of? Only one-to-one in-depth testing can give the right data which will then indicate which level the threshold should be set.

After all is there any point in having a recommender which only gives you recommendations you expect or know about already? However, some participants wanted this from a recommender or expected it and were disappointed when they got results they could not predict. I know if I search for a CD on Amazon that I’m familiar with I sometimes get recommendations I know about or already own. So the recommender means different things to different people. There is a group that are satisfied they know all the recommended texts and can sleep soundly knowing they have completely saturated their research topic and there is a group that need new material.

The long tail hypothesis is a difficult one to prove in a short term project of 6 months. As its name suggests the long tail needs to be explored over a long time. Monitoring borrowing patterns in the library, click through and feedback from the user community and librarians will help to refine the recommender tool for ultimate effectiveness.

But is this what the users want?

I’ll admit it, I’m prepared to out myself, I’ve just finished a post graduate research degree and more than once I have used the Amazon book recommender. In fact when I say more than once, possibly over the course of my studies we’ll be getting into double figures. I’m not ashamed, (I may be about using Wikipedia, but let’s not go there), but I’m not ashamed because I did and so did many of my peers. There may be more traditional methods to conduct academic research, but sometimes, with a deadline looming and very little time for a physical trip to the library to speak to a librarian, finding resources in one or two clicks is just to attractive. My hunch is many other scholars also use this method to conduct research. Recently on another Copac project we facilitated some focus groups. The participants in the groups were postgraduate researchers, a mix of humanities and STEM. Some had used Copac before others had not. Although the focus groups were answering another hypothesis I couldn’t resist asking the gathered group, if they would find merit in a book recommender on Copac which was based on 10 years of library circulation data from a world class research library? It’s not often you see a group of students become visibly excited at the thought of a of new research tool, but they did that night. A book recommender, would make a positive impact on their research practices and was greeted with enthusiasm from the group. I thought it was worth mentioning this incident, because when the going gets tough, and we are drowning under data, it might be worth remembering that users really want this to happen.


Working through the SALT hypothesis

I’m currently project managing, SALT, but my own area of interest is evaluation and  user behaviour – So I’m going to be taking on an active role in putting what we develop in front of the right users (we’re thinking academics here at the University) to see what their reactions might be.  As I think this over, a number of questions and issues come to mind. Are we more likely to look on things favourably if they are recommended by a friend? If we think about what music we listen to, films we go and see, TV we watch and books we read, are we far more likely to do any of those things should we receive a recommendation from someone we trust, or someone we know likes the same things that we like? If you think the answer to this is yes, then is there any reason that we wouldn’t do the same thing should a colleague or peer recommend a book to us that would help us in our research? In fact more so? Going to see a film that a friend recommends that is, well average, it has far less lasting consequences then completing a dissertation that fails to acknowledge some key texts. As a researcher would you value a service which could suggest to you other books which relate to the books you’ve just searched for in your library?

We know library users very rarely take out one book. Researchers borrowing library books tend to search for them centrifugally, one book leads to another, as they dig deeper into the subject area, finding rarer items and more niche materials. So if those materials have been of use to them, could they not also be of use to other people researching in the same area? The University of Manchester’s library is stocked with rare and niche collections, but are they turning up within traditional searching, or are they hidden down at that long end of the tail? By recommending books to humanities researchers that other humanities researchers have borrowed from the library I’m really hoping we can help improve the quality of research – we know that solid research means going beyond the prescribed reading list, and discussing new or different works.  Maybe a recommender function can support this (even if it potentially undermines the authority of the supervisor prescribed list – as one academic has recently suggested to us: “isn’t this the role of the supervisor?”).

Here’s how I’m thinking we’ll run our evaluation: Once the recommender tool is ready, we’ll ask a number of subject librarians to do the first test the tool to see if it recommends what they would expect to see linked to their original search. They will be asked to search the library catalogue for something they know well, when the catalogue returns their search does the recommender tool suggest further reading which seems like a good choice to them? As they choose more unusual books, does the recommender then start suggesting things, which are logically linked, but also more underused materials? Does it start to suggest collections which are rarely used, but never the less just as valuable?  Or does it just recommend randomly unrelated items?  And can some of the randomness support serendipity?

We’ll then run the same test with humanities researcher (it’ll  be interesting to see if librarians and academics have similar responses.  As testing facilitators, we’ll  also be gauging people’s reactions to the way in which their activity data is used. The question is, do users see this as an invasion of their privacy, or a good way to use the data? Do the benefits of the recommender tool outweigh the concerns over privacy?

The testing of the hypothesis will be  crucial indicator as to the legitimacy of the project. Positive results from the user testing will (hopefully) take this project on to the next level, and help us move towards some kind of shared service. But we really need to guage of this segment of more ‘advanced’ users can see the benefit, if they believe that the tool has the ability to make a positive impact on their research, then we hope to extend the project and encourage further libraries to participate. With more support from other libraries then hopefully researchers will be one step closer to receiving a library book recommender.

Surfacing the Academic Long Tail — Announcing new work with activity data

We’re pleased to announce that JISC has funded us to work on the SALT (Surfacing the Academic Long Tail) Project, which we’re undertaking with the University of Manchester, John Rylands University Library.

Over the next six months the SALT project will building a recommender prototype for Copac and the JRUL OPAC interface, which will be tested by the communities of users of those services.  Following on from the invaluable work undertaken at the University of Huddersfield, we’ll be working with ten years+ of aggregated and anonymised circulation data amassed by JRUL.  Our approach will be to develop an API onto that data, which in turn we’ll use to develop the recommender functionality in both services.   Obviously, we’re indebted to the previous knowledge acquired by a similar project at the University of Huddersfield and the SALT project will work closely with colleagues at Huddersfield (Dave Pattern and Graham Stone) to see what happens when we apply this concept in the research library and national library service contexts.

Our overall aim is that by working collaboratively with other institutions and Research Libraries UK, the SALT project will advance our knowledge and understanding of how best to support research in the 21st century. Libraries are a rich source of valuable information, but sometimes the sheer volume of materials they hold can be overwhelming even to the most experienced researcher — and we know that researchers’ expectation on how to discover content is shifting in an increasingly personalised digital world. We know that library users — particularly those researching niche or specialist subjects — are often seeking content based on a recommendation from a contemporary, a peer, colleagues or academic tutors. The SALT Project aims to provide libraries with the ability to provide users with that information. Similar to Amazons, ‘customers who bought this item also bought….’ the recommenders on this system will appear on a local library catalogue and on Copac and will be based on circulation data which has been gathered over the past 10 years at The University of Manchester’s internationally renowned research library.

How effective will this model prove to be for users — particularly humanities researchers users?

Here’s what we want to find out:

  • Will researchers in the field of humanities benefit from receiving book recommendations, and if so, in what ways?
  • Will the users go beyond the reading list and be exposed to rare and niche collections — will new paths of discovery be opened up?
  • Will collections in the library, previously undervalued and underused find a new appreciative audience — will the Long Tail be exposed and exploited for research?
  • Will researchers see new links in their studies, possibly in other disciplines?

We also want to consider if there are other  potential beneficiaries.  By highlighting rarer collections, valuing niche items and bringing to the surface less popular but nevertheless worthy materials, libraries will have the leverage they need to ensure the preservation of these rich materials. Can such data or services assist in decision-making around collections management? We will be consulting with Leeds University Library and the White Rose Consortium, as well as UKRR in this area.

(And finally, as part of our sustainability planning, we want to look at how scalable this approach might be for developing a shared aggregation service of circulation data for UK University Libraries.  We’re working with potential data contributors such as Cambridge University LibraryUniversity of Sussex Library, and the M25 consortium as well as RLUK to trial and provide feedback on the project outputs, with specific attention to the sustainability of an API service as a national shared service for HE/FE that supports academic excellence and drives institutional efficiencies.