Copac Beta Interface

We’ve just released the beta test version of a new Copac interface and I thought I’d write a few notes about it and how we’ve created it.

Some of the more significant changes to the search result page (or “brief display” as we call it) are:

  • There are now links to the library holdings information pages directly from the brief display. You no longer have to go via the “full record” page to get to the holdings information.
  • You can see a more complete view of a record by clicking on the magnifying glass icon at the end of the title. This enables you to quickly view a more detailed record without having to leave the brief display.
  • You can quickly edit your query terms using the search forms at the top of the page.
  • To further refine your search you can add keywords to the query by typing them into the “Search within results”  box.
  • You can change the number of records displayed in the result page.

The pages have been designed using Responsive Web Design techniques — which is jargon that means that the HTML5 and CSS have been designed in such a way that the web page rearranges itself depending on the size of your screen. The new interface should work whether you are using a desktop with a cinema display, a tablet computer or a mobile phone. Users of those three display types will see a different arrangement of screen elements and some may be missing altogether on the smaller displays. If you use a tablet computer or smartphone, then please give beta a try on them and let us know what you think.

The CGI script that creates the web pages is a C++ application which outputs some fairly simple, custom, XML. The XML is fed through an XSLT stylesheet to produce the HTML (and also the various record export formats.) Opinion on the web seems divided on whether or not this is a good idea; the most valid complaints seem to be that it is slow. It seems fast enough to us and the beta way of doing things is actually an improvement as there is now just one XSLT used in creating the display, whereas our old way of doing things used multiple XSLT stylesheets run multiple times for each web page. Which probably just goes to show that the most significant eater of time is the searching of the database rather than the creation of the HTML.

Middle Temple Library Rare Books and Manuscripts Collection loaded

We’re pleased to announce that the holdings of Middle Temple Library Rare Books and Manuscripts collection are now available to search through Copac. Middle Temple Library

Founded in 1641 by Robert Ashley, Middle Temple library has a very large collection of early printed books (i.e. printed between 1450 and 1800). As Middle Temple is primarily a legal library geared towards practitioners, Copac will be adding the early printed books collection which covers a wide range of subject matter, from astrology to zoology. Middle Temple library holds a number of unique books printed on the Continent, as well as many rare items. In addition, they have the largest holdings of John Donne’s own personal library.

To browse, or limit your search to, the holdings of Middle Temple Library, go to the main search tab on copac.ac.uk/search and choose ‘Middle Temple Library’ from the drop-down list of libraries.

Copac deduplication

Over 60 institutions contribute records to the Copac database. We try to de-duplicate those contributions so that records from multiple contributors for the same item are “consolidated” together into a single Copac record. Our de-duplication efforts have reduced over 75 million records down to 40 million.

Our contributors send us updates on a regular basis which results in a large amount of database “churn.” Approximately one million records a month are altered as part of the updating process.

Updating a consolidated record

Updating a database like Copac is not as immediately intuitive as you may think. A contributor sending us a new record may result in us deleting a Copac record. A contributor who deletes a record may result in a Copac record being created. A diagram may help explain this.

A Copac consolidated record created from 5 contributed records. Lines show how contributed records match with one another.

The above graph represents a single Copac record consolidated from five contributed records: a1, a2, a3, b1 & b2. A line between two records indicates that our record matching algorithm thinks the records are for the same bibliographic item. Hence, record a1,a2 & a3 match with one another; b1 & b2 match with each other and a1 matches with b1.

Should record b1 be deleted from the database, then as b2 does not match with any of a1, a2 or a3 we are left with two clumps of records. Records a1, a2 & a3 would form one consolidated record and b2 would constitute a Copac record in its own right as it matches with no other record. Hence the deletion of a contributed record turns one Copac record into two Copac records.

I hope it is clear that the inverse can happen — that a new contributed record can bring together multiple Copac records into a single Copac record.

The above is what would happen in an ideal world. Unfortunately the current Copac database does not save a log of the record matches it has made and neither does it attempt to re-match the remaining records of a consolidated set when a record is deleted. The result is that when record b1 is deleted, record b2 will stay attached to records a1, a2 & a3. Coupled with the high amount of database churn this can sometimes result in seemingly mis-consolidated records.

Smarter updates

As part of our forthcoming improvements to Copac  we are keeping a log of records that match. This makes it easier for the Copac update procedures to correctly disentangle a consolidated record and should result in less mis-consolidations.

We are also trying to make the update procedures smarter and have them do less. For historical reasons the current Copac database is really two databases: a database of the contributors records and a database of consolidated records. The contributors database is updated first and a set of deletions and additions/updates is passed onto the consolidated database. The consolidated database doesn’t know if an updated record has changed in a trivial way or now represents another item completely. It therefore has no choice but to re-consolidate the record and that means deleting it from the database and then adding it back in (there is no update functionality.) This is highly inefficient.

The new scheme of things tries to be a bit more intelligent. An updated record from a contributor is compared with the old version of itself and categorised as follows:

  • The main bibliographic details are unchanged and only the holdings information is different.
  • The bibliographic record has changed, but not in a way that would affect the way it has matched with other records.
  • The bibliographic record has changed significantly.

Only in the last case does the updated record need to be re-consolidated (and in future that will be done without having to delete the record first!) In the first two cases we would only need to refresh the record that we use to create our displays.

 

An analysis of an update from one of our contributors showed that it contained 3818 updated records; 954 had unchanged bibliographic details and only 155 had changed significantly and needed reconsolidating. The saving there is quite big. In the current Copac database we have to re-consolidate 3818 records. In the new version of Copac we only need to re-consolidate 155. This will reduce database churn significantly, result in updates being applied faster and allow us to have more contributors.

Example Consolidations

Just for interest and because I like the graphs, I’ve included a couple graphs of consolidated records from our test database. The first graph shows a larger set of records. There are two records in this set that when either are deleted would result in the set being broken up into two smaller sets.

The graph below shows a smaller set of records where each record matches with every other record.

Performance improvements

The run up to Christmas (or Autumn term if you prefer) is always our busiest time of year as measured by the number of searches performed by our users. Last year the search response times were not what we would have liked and we have been investigating the causes of the poor performance and ways of improving it. Our IT people determined that at our busiest times the disk drives in our SAN were being pushed to their maximum performance and just couldn’t deliver data any faster. So, over the summer we have installed an array of Solid State Disks to act as a fast cache for our file-systems (for the more technical I believe it is actually configured as a ZFS Level 2 Cache.)

The SSD cache was turned on during our brief downtime on Thursday morning and so far the results look promising. I’m told the cache is still “warming up” and that performance may improve still further. The best performance indicator I can provide is the graph below. We run a “standard” query against the database every 30 minutes and record the time taken to run the query. The graph below plots the time (in seconds) to run the query since midnight on the 23rd August 2011. I think it is pretty obvious from looking at the graph exactly when the SSD cache was configured in.

It all looks very promising so far and I think we can look forward to the Autumn with less trepidation and hopefully some happier users.

Imperial catalogue reload underway

We are currently reloading the Imperial College London catalogue to reflect local catalogue changes. Consequently, a small percentage of the Imperial catalogue will be unavailable on Copac this week. The full catalogue should be available from next Monday.

Apologies for the short-term loss of availability of this material.

**Update – the catalogue load is complete, and all material is available on Copac.

National Maritime Museum Caird Library records loaded

We’re pleased to announce that the holdings of the National Maritime Museum Caird Library have been added to Copac.

The National Maritime Museum is the largest maritime museum in the world. It houses important holdings on the history of Britain at sea, totalling nearly 2.5 million items, Caird Libraryincluding maritime art, cartography, ship models, scientific and navigational instruments, and manuscripts, and the world’s largest maritime reference library. The Museum’s Archive & Library collections are a nationally and internationally important resource, with items as diverse as rare books, diaries, log books, letters, manuscripts, maps and charts. As a Place of Deposit designated by the National Archives, the collections contain important public records including in-letters of the Board of Admiralty, Lieutenants’ logs of the Navy Board, and crew lists and Certificates of Competency and Service (Masters’ certificates) of the Registrar General of Shipping and Seamen.

To browse, or limit your search to, the holdings of the National Maritime Museum Caird Library, go to the main search tab on copac.ac.uk/search and choose ‘National Maritime Museum’ from the drop-down list of libraries.

Copac trial interface: feedback

Many thanks to those of you that gave us feedback on the recent trial of the new Copac user interface. We really appreciate the time you put into testing and responding to us through the feedback form, email, and twitter.

I’ve summarised the feedback below:

  • In general you gave an enthusiastic response to the new interface design, including positive comments on the layout and workflow. Those who tried it on mobile devices were pleased with the how it came out.
  • There were also positive comments about the range of features, with the availability of the holding library list on the initial search result listing being particularly popular.
  • The grey ‘colour scheme’ generated a number of comments. Some people liked it but others definitely didn’t! The lack of colour on the site was to try and avoid getting too much comment on the graphics as opposed to the functionality of the new interface, so it won’t be staying monochrome.
  • There were individual comments about wording, screen elements, or requests for additional features, which are all valuable in helping us refine the presentation and facilities.
  • Amongst those who didn’t like the new interface the major concern was the lack of the ‘Main search’ screen with its range of detailed search options. Whilst the initial test was working just with the Quick search, we can reassure that we always intended to reintroduce the other search screens once we had feedback on the overall design. This obviously wasn’t as clear as we’d hoped.

We are continuing to work on the interface, reassured that we are moving in the right direction for most of you. In the next stage we’ll be incorporating colour as well as adding the missing search screens. We will also be making changes in response to comments or requests relating to individual features, as well as ensuring that it works well for as wide a range of browsers and devices as possible.

You’ll be able to try out the new interface again in a few months time and provide input into the final version before the work is completed.

Institution of Mechanical Engineers Library loaded

We are pleased to announce that the holdings of the Institution of Mechanical Engineers library have been added to Copac.

The Institution of Mechanical Engineers was founded in 1847 and the library collection has evolved throughout the Institution’s life. One of the best mechanical engineering collections in the world, the library holds many rare and specialist resources, including an extensive historical journal collection, and a specialist standards collection which includes many hard to find American standards.

The collection covers the industry areas of : railway, process engineering, automotive, aerospace, medical engineering, building services, waste management, power systems, pressure systems and manufacturing. Core subject areas include: machine mechanics, machine design, mechanisms, kinematics, fluid dynamics, fluid mechanics, thermodynamics, combustion, power drives, materials, renewable energy, product design, machine tools, project management and finite element analysis. The archive includes collections of personal papers from important engineers and engineering companies.

To browse, or limit your search to, the holdings of the Institution of Mechanical Engineers library, go to the main search tab on copac.ac.uk/search and choose ‘Institution of Mechanical Engineers’ from the drop-down list of libraries.

Copac trial interface – have your say on the future of Copac!

We are developing a new style of Copac interface with greater search flexibility, new functionality, and clearer displays. Following initial user testing we’re now opening up the trial interface for further comment. We’re making the early draft interface available for a week from 12.00 noon 23rd May to 12.00 noon 30th May. This is your opportunity to try out the new interface, and let us know what you think!

Access the Copac Alpha trial interface.

Please note: The interface is very pared down, and there is no colour scheme. Some elements are just placeholders for planned options. The interface is designed to work in the latest browsers – you might experience issues with display/functionality in older browsers, such as IE 6 and 7.

We’d really appreciate your input into this work. There are feedback options on the screens and all comments will feed into the ongoing development process. You can also email copac@mimas.ac.uk with your feedback.

There will be further opportunities to comment on the interface redevelopment as the work continues. This is part of the complete redevelopment of the Copac service and additional interface facilities will become available at later stages on the work.