Persistent identifiers for Copac records

If you know the record number of a Copac record, there is now a simple url that will return you the record in MODS XML format. The urls take the following form: http://copac.ac.uk/crn/<record-number>. For instance, the work “China tide : the revealing story of the Hong Kong exodus to Canada” has a Copac Record Number of 72008715609 and can be linked to with the url http://copac.ac.uk/crn/72008715609.

Over the next few weeks we’ll be looking at adding these links to the Copac Full record pages and also introducing links to Bookmarking web sites such as delicio.us.

Chetham’s Library catalogue loaded

Chetham’s Library was founded in 1653 and is the oldest public library in the English-speaking world. It holds more than 100,000 volumes of printed books, of which 60,000 were published before 1851. These include rich collections of sixteenth and seventeenth-century printed works, periodicals, broadsides and other ephemera, with strong collections in theology, history, law, medicine, and science.

The Library defines its core collection as those works which were acquired from the 1650s up to 1851, but has actually acquired a lot of material since then, including the collection of John Byrom, shorthand writer and linguist; 3,100 broadsides and single-sheet publications; a collection of early maps; a variety of Scrapbooks containing rare ephemera; a significant collection of shorthand material; and c. 7,000 tracts and pamphlets of various dates.

The catalogue has been added as part of the Copac Challenge Fund.

Institute of Education reload

Last week we started re-loading the Institute of Education Library records. Due to the number of records involved it will take a little while to complete the operation and as of today approximately half of the records are visible in the Copac interfaces. The rest of the records should be available this time next week.

The re-load was required to enable better access to live circulation information from the Institute’s Library Management System.

Search Solutions 2008

On Tuesday last I attended “Search Solutions 2008” organised the BCS-IRSG and to quote from event programme, “Search Solutions is a special one-day event dedicated to the latest innovations in information search and retrieval.” The format of the day was a series of short talks, 11 in all, each about 20 minutes in length with the chance for questions from the audience after each talk.

One of the themes through the day was the linguistic analysis of texts such as blog posts and web pages. Or in other words, deducing the correct meaning of a word like Georgia; is it referring to someone called Georgia, the country that used to be part of the USSR, or the USA State. As all the speakers were from commercial companies no-one was giving their secrets away, but approaches mentioned ranged from Bayesian analysis to a team of 50 linguistic experts.

Another theme was how social networking can help users find what they’re looking for. User recommendations and tagging were both cited frequently in this regard. Elias Pampalk from last.fm gave a very interesting talk on how tagging is being used on last.fm. They have made it very easy for users to tag. Adding a tag usually involves no typing — just a couple of mouse clicks to select either a tag you’ve used before or a tag someone else has used for that item. There is also incentive for people to tag at last.fm as it can help you discover new music and connect you to people with similar tastes. They seem to have gotten it right as they are collecting over 2.5 million tags per month.

At the end of his talk, Elias mentioned that last.fm had an open API, which I had never realised before. This got me wondering if we could provide links from Copac to last.fm. This perhaps isn’t as strange an idea as it may first seem. Copac doesn’t hold records for just books, we have many records in the database for CD and sheet music. It might be kind of neat to provide a link from those records to last.fm’s page about the artist or album and perhaps pull in images as well? Something to think about when we can find a bit of spare time.

Overall it was a very interesting day with many thought provoking talks and I’d happily attend a similar day next year.

Handling XML errors

I’ve just installed some updated software that should increase the reliability of the web service. Unfortunately, while I was installing the software people using the service will have seen error messages in place of our records. The disruption should only have lasted a minute or two and everything should be working now.

The update allows us to better cope with errors in the records. In the past an XML error in one record in a page of results was causing users to see a “500 Internal Server Error” page rather than their records. Things are now better, though not perfect. We still cannot display the record with the errors, but the rest of the records are displayed and there should be no more Internal Server Error pages because of bad XML. Records with errors will now show as follows in the brief display:

An undisplayable record in the Brief display.

An un-displayable record in the Brief display.

As I mentioned in a previous post our database software does not natively support XML and it is occaisionally inserting line-breaks where it shouldn’t — such as in the middle of an XML Entity! Our next task is to modify our line breaking algorithm (so that the database doesn’t need to do it itself) and correct the the affected records.

SRU Developments

I am a member of the OASIS Technical Committee that is attempting to formally standardize SRU. Some of the enhancements we are proposing to make to SRU as part of the standardization process are listed below:

  1. Allow Non-XML Record Representations
  2. Enhancements to Proximity searches in CQL
  3. Faceted Searching
  4. Ability for a server to be vague about the Result Set size
  5. Multiple Query Types
  6. Eliminate the Version and Operation Parameters 
  7. Alternative Response Formats

Some of the above are fairly trivial, such as the ability of the server to return an approximate number of records found by the query. It may not be immediately obvious why a server may not want to give an exact number of records found, but it enables very useful performance optimizations to be made on the server. For example, when you do a search on your favourite Internet search engine it will probably say something like “Results 1 – 10 of about 1,050″ on the results page.

We are also being asked to enhance proximity searching so that it will support structured records. I.e. the sort of data you might find in a complex XML document. Some such queries might be as follows:

  • author = smith and date =2006, but both must be found within the same containing XML element.
  • dc.creator is in the second grandchild of the grandfather of a node with dc.date = 2006 

Some of the enhancements, such as multiple query types and response formats are quite controversial within the community. One objection being that by giving implementors choices, you will fragment the community and remove any chance of interoperability.

If you have an opinion about any of the above you are encouraged to join in the discussions by joining the OASIS Search Web Services Technical Committee.

Copac 2.0 (as I prep for the ILI conference)

The blog’s been silent this week. Ashley’s getting on with the practical business of development, and I’ve been spending time writing up content for my presentation at the International Internet Librarian conference, as well as a presentation to the Mimas Board of Directors on D2D and the future of Copac (more on that next week).

I’m speaking in a panel on “The OPAC and the Library of the Future” at ILI, and writing up my thoughts has been a great opportunity to hash out some of the tensions and challenges surrounding the whole Copac 2.0 thing.   I now think ILI “owns” what I’ve submitted to them (and it’s not available online yet). So to avoid any handslapping (and to echo Austen Powers) “allow myself to quote myself”:

As Copac and its stakeholders think strategically about an approach to Web 2.0 and specifically customisation and adaptive personalisation, as I will discuss, multiple issues and tensions emerge. Central among them is striking that delicate balance between ‘openness’ and ‘control.’ We want to promote an ethos where Copac data is opened up (via APIs for instance) and made available for the community in as useful form as possible, but we also recognise that this means devolving control over what happens to that data. At the same time, as a collective resource comprising of over 50 UK libraries, we are also considering how Copac is uniquely positioned as an aggregator and can, to use Lorcan Dempsey’s phrase: “reinforce the value of network effects,” and so increase “gravitational pull” towards a concentrated service that supports UK focussed research. [1]By gathering ‘personalisation’ or ‘intentional’ data, Copac (and other JISC bibliographic services) can potentially move to a model where adaptive personalisation is supported, including those desirable Amazon-like recommender functions. We can potentially help to begin to yield that ‘long tail’ of under-used or little known UK library resources, for example those unique and rare items now incorporated into Copac via the Kew Botanical Gardens, the Natural History Museum, or Royal College of Surgeons.

This, as yet, is a tentative and unformed vision for Copac, a vision we are now attempting to refine and focus. This future for the service is by no means set in stone, and our ideas are very much exploratory at this stage. Copac occupies a single position in a complicated terrain which encompasses the UK resource discovery and library landscape and, of course, the much larger global terrain occupied by WorldCat, Google, Google Books, and Amazon.com. Our strategic planning is necessarily complex. Nonetheless, we are eager to open up conversations about the future of a service like Copac, and at the close of this talk would like to invite feedback and insight over the strategic directions Copac might take, and new questions that emerge.

So that gives you the gist of what I’m planning to talk about, and I am hoping it will be a good opportunity to get some feedback.  In the meantime, any thoughts or comments here are very welcome. Happy friday:-)

Society of Antiquaries of London catalogue loaded.

The Society’s Library is the major archaeological research library in the UK. The Library’s present holdings number more than 100,000 books and around 800 currently received periodical titles.

It holds British county histories, a collection of eighteenth- and nineteenth-century books on the antiquities of Britain and other countries, and a wide-ranging collection of periodical titles (British and foreign) with runs dating back to the early to mid-nineteenth century.

A wide range of subjects is covered. The focus is on:
• British and European archaeology
• British record publications
• Architectural history
• Decorative arts
• Numismatics, heraldry and genealogy.

The catalogue has been added as part of the Copac Challenge Fund.