Roadblock to full OpenURLness [Updated]

This week I encountered a significant roadblock when trying to use OpenURL in a situation where it is a natural fit. Let me explain the scenario. A scientific researcher at the company where I work built an extensive bibliography of journal articles on a particular subject, and wants to publish that bibliography on the company intranet, complete with hyp[er]text links to the full text. This person initially thought it’d be ok to simply mount the full text articles that he had downloaded in the same webspace as the bibliography, and simply link to the files. Of course, that ideas was quickly shot down. Instead, we thought, why can’t we take this bibliography, check it against our SFX KnowledgeBase to see what articles we have available in full text, and then output the complete OpenURL for each of those articles for this researcher to use when marking up and publishing his bibliography?

The use case sounds straightforward, right? Turns out that it is anything but. I was provided with a text file of citations and was asked to come up with appropriate SFX links for each. Of course I could have manually rekeyed the citations one by one into a search form querying our SFX KB, but that would take quite a long time and quite a bit of effort. I tried to think of how this whole process could be automated.

On the advice of Dan Chudnov I downloaded an open source application written in Perl called Biblio-Citation-Parser, which on the face of it seemed to be exactly what I needed. I need a way to automatically parse the whole list of citations into the necessary chunks of metadata, and then automatically generate an OpenURL for each citation. After trying unsuccessfully to get Biblio-Citation-Parser to work (this isn’t a limitation of the software but of my Perl expertise), I sent queries out to other SFX users as well as to the Code4Lib discussion list. There were several responses from members of the Code4Lib discussion list, some of whom mentioned the application that I already knew about. But it turns out that pretty much nobody in that community [at least among those who responded] had ever used it, and also, that nobody in that community had come up with a good solution to this parsing problem themselves.

Since the original citations were stored in Reference Manager, one of the more common citation management software applications, I wrote back to the colleague who first asked me to help with this situation, asking him if he could provide me with the Reference Manager files. He did, and I downloaded a free trial version of the software, imported the references, then exported them in RIS format. Next, I imported the RIS output file into Zotero, and then exported the whole bibliography from Zotero into a readymade HTML bibliography. Because of Zotero’s built-in COinS functionality, the readymade HTML bibliography is automatically populated with OpenURLs. But I wasn’t done yet. I had to go through each citation by hand and test whether we did indeed have the article in full text, and also, to edit the HTML coding to substitute our company’s specific SFX base URL in each link.

In the end, I achieved what the user wanted — a list of bibliographic references with SFX links as the hypertext links. But it was a huge amount of work, and I kept asking myself, surely there is a better, easier way to do this?! Surely, someone, somewhere has already solved this problem of how to readily parse bibliographic citations in a text file and run them through a process to check for which articles are available in full text?

Maybe there is a much simpler solution and if you know of it, please comment on this post to let me know. I’m left thinking that this whole OpenURL stuff still has a ways to go in terms of ease of implementation for situations like I described.

Additional thoughts and comments from NASIG 2007

Below is a smattering of additional thoughts and comments from NASIG 2007:

Karen Schneider‘s paranoia and negativity about things like Google, EPA library closings, survival of small press publishers. Her purpose as a vision speaker was to stimulate debate and thought and I think she succeeded in that. I may not agree with her overall philosophy or approach to these issues but I think it is very healthy to step back and question some of the broader trends in librarianship and ask the hard questions about where they are leading us.

Brainstorming session intended to provide a forum for discussing the problem of reluctance on the part of members to run for office. I wish more people other than “old timers” had spoken up and that there had been more focus on concrete answers to the questions raised by the moderator (Katy Ginanni) and less on generalizations about “trust me, it is really, really good to serve on the Board.”

Dan Chudnov‘s emphasis on the need for simplicity in resource access and discovery. His reliance on iTunes as the standard for judging simplicity has many some flaws even if his general point is well taken. I especially liked his point about trying something out and tweaking it a little. That little tweak may pay off in huge dividends in terms of successful adoption of a new technology. I also think he has a great idea by proposing that libraries insert themselves into the realm of what he terms “service links.” These are available in just about every major media outlet on the web and commonly include links to Technorati, del.icio.us, and other social networking services.

Yet more assumptions about fellow librarians having or sharing the same — liberal — political approach and philosophy. E.g. several negative references to the current (Bush) administration, wearing t-shirt supporting a Democrat’s presidential candidacy, etc. Noticed Dan Chudnov’s reference in his speech to “liberal” vs. “right wing” — perhaps an unconscious but notable inflection of wording.

Importance of networking. I am not a social butterfly at all. In fact, lots of social interaction leaves me exhausted. (By contrast, others like my friend Beverley Geer get their energy from social interaction.) In spite of my natural shyness — some people tell me I come across as aloof — I keep trying to hone my skills and break down the barriers that hold me back from meeting new people. At NASIG this is easier for me to do than in some other situations. I enjoyed sitting next to people at the dine-arounds who were total strangers to me, conversing with them about their work, their interests, and issues of mutual concern. In this way I found out some really interesting details, such as the fact that one longtime NASIG member is an accomplished piano (and flute) player, with two Steinway grands. I learned a lot of interesting facts about the city of Houston from someone else, such as the fact that it had no zoning laws of any kind until relatively recently. Yet another conversation filled me in on what it is like as a foreign national to live in Johannesburg, South Africa (like living in a prison).

Discussion with a librarian from a university in the Southwest about what it’s like to have a non-librarian as library director and the drastic — good — changes brought about so that the library is once more popular with students as a destination.

Several mentions of “work / life balance.”

General recognition (I think) that ERMS are not working out well for many, at least not yet. I likened them to a solution in search of a problem in one open mic comment at a session and described my library’s very recent decision to get out of the vendor-supplied ERMS game altogether. There was some interest in open source solutions.

Staying current: a survey response

Ann Ercelawn, a dear friend and co-moderator of the SERIALST discussion list, posted a survey on that list yesterday that asked for responses to a series of questions relating to how we keep current within the LIS field. Below is the response I sent her. It’s not as detailed or complete as it should be but I was in a hurry ;-)

1) What are the websites that you find most useful?

I find that I rarely go to a library-related website anymore, instead relying on RSS feeds. And if a library-related website doesn’t offer an RSS feed, I am highly unlikely to refer to it much again.

2) What listservs do you find indispensable?

Here, too, I am finding myself really paring down my participation in listservs. I’m still subscribed to SERIALST and I also pay attention to SFX-DISCUSS-L, LIB-STATS, LIS-E-JOURNALS, and ERIL-L. That’s about it, though.

3) What are the most important formal publications (in print or online) that you read on a regular basis?

Serials Review, LCATS, D-LIB, Library Journal. Increasingly, though, I am not reading formal publications as much, instead, as in the case of websites and listservs, relying on blogs, wikis, and RSS feeds to obtain the information about what’s going on in my areas of interest. I am much more selective about what parts of formal publications I read.

4) What are the top 5-8 blogs that you read?

Walt at Random, Thingology (LibraryThing’s ideas blog), Roy Tennant’s Digital Libraries, Peter Scott’s Library Blog, One Big Library, Lorcan Dempsey’s Weblog, LISNews.org, Information Wants to Be Free, Hectic Pace.

5) Are there podcasts that you listen to on a regular basis?

Not really, but ones I have listened to and/or recommend include Library Geeks by Dan Chudnov, and the podcasts output as part of the SirsiDynix Institute.

6) What other resources do you consult or recommend?

I am a huge fan of RSS because it saves me so much time and money. Use a free RSS reader like Google Reader or Bloglines and begin collecting library-related feeds. You won’t be sorry.

Attending NASIG

Soon I will be among friends at the 22nd annual NASIG conference held this year in Louisville, Kentucky. Mark Lindner will be along for the ride as well, which is great. The theme of this year’s conference is “Place Your Bet in Kentucky: The Serials Gamble.”

I will be joining several others in a panel presentation focusing on alternative careers in librarianship to be held on Saturday afternoon. The abstract for our presentation is “Regeneration,” “diversification” and “redesign” are buzzwords tossed around constantly in today’s job market. Those with M.L.S. degrees are facing a sea change of options in their career paths. While these new opportunities can be exhilarating and exciting, they can be somewhat daunting as well. This panel of librarians will discuss the unique twists and turns of their very divergent careers and offer suggestions on how to market your M.L.S. degree for nontraditional jobs. We anticipate and encourage a high level of discussion between the panel and the audience.

Bloggers whom I anticipate will be there — that is, aside from Mark (Off the Mark) and myself — include Karen Schneider (Free Range Librarian), Anna Creech (Eclectic Librarian), Dan Chudnov (One Big Library), Diane Hillmann (contributor to LITAblog), and maybe others I don’t know about yet.

I may or may not be blogging about NASIG experiences and sessions during the next several days. Stay tuned.

My del.icio.us bookmarks for February 21st through February 24th

These are my links for February 21st through February 24th:

  • BiblioCommons – Billed as a soon-to-be-unveiled "social discovery system for libraries," whatever that means.
  • E-LIS – Eprints for LIS – An open archive for papers, presentations, articles, syllabi, and other writings relating to library and information science.
  • LibraryFind – Metasearch software. A great example of open source software developed by and for librarians, including folks like Dan Chudnov. I’d love to try it out.
  • Blogging believers: Who’s out there in the blogosphere? – A summary of a recent survey of Christian bloggers.
  • WorldCat Registry – An interesting new service from WorldCat that provides a way to integrate all kinds of library profile information in one place.

A quick conference trip to Washington, D.C.

For the past few days I’ve been on a quick conference trip to a meeting in the Washington, D.C. area. The meeting was organized by NISO and was entitled “From Discovery to Delivery: Solutions to Put Your Content Where the Users Are.”

While there was nothing new or startlingly different about the content of the meeting, for me, at least, I think it was a worthwhile trip overall. The best part of the whole workshop was attending Dan Chudnov’s presentation on “COinS, unAPI, and a Plan for Zero Configuration Service Discovery.” Dan is a great speaker; humorous yet thorough, with an ability to easily explain some pretty technical stuff in a way that most people can understand. I was not surprised to see that he uses a Mac (way to go Mac lovers!) and I liked his use of Keynote for his presentation. The transition theme he used seemed to bother a few people and one person loudly remarked with a sneer, “Looks like a Mac application.” (Get a life, Windows lovers.) What I particularly liked about the approach Dan took with his talk was that he made it Lego-like, that is, piece built upon piece built upon piece, until he reached the (pardon the pun) piece-de-resistance, zero configuration service discovery. His vision for making things completely simple for users, with no configuration necessary for them and no need for them to know about the technical magic that lies behind the user experience, is truly invigorating. The basic focus he had was on using OpenURL and combining it with several other “off-the-shelf” standards to make it dead easy for users to navigate to resources they need. One of the technologies he highlighted was Apple’s excellent Bonjour application for auto-discovery of networked resources such as websites or printers. He also brought up the example of Apple’s iTunes and how it easily allows users on the same network to discover and then play shared music libraries. Overall, this was a great presentation and I am very thankful we have someone of Dan’s caliber to push the technological boundaries in our profession. I wanted to introduce myself to him but didn’t get to do that before the end of the meeting.

Andrew Pace of the Technically Speaking column in American Libraries and author of the Hectic Pace blog, was also in attendance and it was the first time I had seen him in person and heard his by now well-travelled talk about what NCSU has done with its Endeca-powered online catalog. Andrew also is an engaging speaker. I didn’t learn much that I didn’t already know about the work he and others have done but it was interesting to have it presented in person anyway. I wish that I could have spoken with him and others there about the work I am involved in regarding integration of my library’s online catalog with another commercial search engine, work that I think might be interesting to others because it makes new uses of library data that are different than what I have heard is being done anywhere else.

A third highlight of the event was a presentation from someone from the National Academies Press who talked about the challenges and changes they have implemented in providing improved resource discovery for materials they publish. Michael Jon Jensen gave the presentation and he is their Director of Web Communications for the National Academies and Director of Publishing Technologies for National Academies Press. Under his direction this entity has done some really interesting experimentation and development of ways to improve access to the 3,600 books they publish, including development of their own clustering results. One of the things he said that most stood out to me was that National Academies Press provides their books for free in HTML form but they charge for PDF versions. The reason for charging for PDF is that, as he put it, our society still values and treasures the framework and “ethos” of the printed book. Those aren’t his exact words but I think it captures the idea he put forward. He said that a printed book is worth more than the individual pieces, it is bigger and better as a whole collection contained in one package. I thought this to be a very interesting perspective that has important ramifications for how we present and deliver information in an increasingly e-only world.

Jane Burke, former CEO at Endeavor and someone with whom I have always gotten along, was also there as a presenter and it was nice to chat with her for a while and to hear how she is doing in her job leading Serials Solutions.

Finally what made the trip special was the chance to catch up with old friends, Janet Lee-Smeltzer and Tom Wilson. Janet works at UMBC and Tom worked until recently at University of Maryland, College Park. Each night they picked me up from my hotel and we had dinner together and talked far into the evening about librarianship, Web/Library 2.0, library politics, and many other topics.