Roadblock to full OpenURLness [Updated]

This week I encountered a significant roadblock when trying to use OpenURL in a situation where it is a natural fit. Let me explain the scenario. A scientific researcher at the company where I work built an extensive bibliography of journal articles on a particular subject, and wants to publish that bibliography on the company intranet, complete with hyp[er]text links to the full text. This person initially thought it’d be ok to simply mount the full text articles that he had downloaded in the same webspace as the bibliography, and simply link to the files. Of course, that ideas was quickly shot down. Instead, we thought, why can’t we take this bibliography, check it against our SFX KnowledgeBase to see what articles we have available in full text, and then output the complete OpenURL for each of those articles for this researcher to use when marking up and publishing his bibliography?

The use case sounds straightforward, right? Turns out that it is anything but. I was provided with a text file of citations and was asked to come up with appropriate SFX links for each. Of course I could have manually rekeyed the citations one by one into a search form querying our SFX KB, but that would take quite a long time and quite a bit of effort. I tried to think of how this whole process could be automated.

On the advice of Dan Chudnov I downloaded an open source application written in Perl called Biblio-Citation-Parser, which on the face of it seemed to be exactly what I needed. I need a way to automatically parse the whole list of citations into the necessary chunks of metadata, and then automatically generate an OpenURL for each citation. After trying unsuccessfully to get Biblio-Citation-Parser to work (this isn’t a limitation of the software but of my Perl expertise), I sent queries out to other SFX users as well as to the Code4Lib discussion list. There were several responses from members of the Code4Lib discussion list, some of whom mentioned the application that I already knew about. But it turns out that pretty much nobody in that community [at least among those who responded] had ever used it, and also, that nobody in that community had come up with a good solution to this parsing problem themselves.

Since the original citations were stored in Reference Manager, one of the more common citation management software applications, I wrote back to the colleague who first asked me to help with this situation, asking him if he could provide me with the Reference Manager files. He did, and I downloaded a free trial version of the software, imported the references, then exported them in RIS format. Next, I imported the RIS output file into Zotero, and then exported the whole bibliography from Zotero into a readymade HTML bibliography. Because of Zotero’s built-in COinS functionality, the readymade HTML bibliography is automatically populated with OpenURLs. But I wasn’t done yet. I had to go through each citation by hand and test whether we did indeed have the article in full text, and also, to edit the HTML coding to substitute our company’s specific SFX base URL in each link.

In the end, I achieved what the user wanted — a list of bibliographic references with SFX links as the hypertext links. But it was a huge amount of work, and I kept asking myself, surely there is a better, easier way to do this?! Surely, someone, somewhere has already solved this problem of how to readily parse bibliographic citations in a text file and run them through a process to check for which articles are available in full text?

Maybe there is a much simpler solution and if you know of it, please comment on this post to let me know. I’m left thinking that this whole OpenURL stuff still has a ways to go in terms of ease of implementation for situations like I described.

Zotero is an amazing tool

Anyone who needs to create or maintain bibliographic references should take a close look at Zotero. It is an amazing tool and one that I have begun to use more and more, thanks to encouragement from Mark Lindner. As this blog post from Zotero mentions, I have begun to make use of Zotero’s built-in HTML export functionality to enable autodiscovery of my publications. Another feature that I really like is its built-in ability to take a snapshot of any webpage. There’s lots more than that to it, though, so go check it out yourself.

My del.icio.us bookmarks for June 6th through June 11th

These are my links for June 6th through June 11th:

  • COinS Generator – “This tool will take bibliographic metadata for a citation and produce a “COinS”, i.e. a snippet of HTML that can be placed on a webpage and processed by web tools.”
  • Scopus – A multidisciplinary database of citations to articles in the life, health, physical, and social sciences.
  • Bolinfest Changeblog » Your Page Here (an iGoogle gadget) – A nifty and easy-to-use way to incorporate other content as tabs into iGoogle. I’m experimenting with using this for Google Reader, Facebook, and Meebo.
  • FML – A personal blog about family, libraries, and technology
  • TagsAhoy: All your tags in one place – Love this idea; not sure, though if it’ll prove useful or not. Not because of the site’s functionality but because of my lackadaisical approach to tagging my own stuff.
  • nuTsie – A cool new beta service allowing users to stream their iTunes libraries to their cell phones. I sure hope this works with Blackberry devices — I’m going to give it a try.

Reality distortion field

Along with what I wrote about Apple’s new iPhone, I wanted to point out the existence of a Wired article that does a good job of articulating what I hope for and what I think might eventually be possible with a device such as the iPhone. Check out this article at http://www.wired.com/news/technology/gizmos/0,72477-0.html. I especially like the reference near the end to the “reality distortion field” that surrounds MacWorld.

A quick conference trip to Washington, D.C.

For the past few days I’ve been on a quick conference trip to a meeting in the Washington, D.C. area. The meeting was organized by NISO and was entitled “From Discovery to Delivery: Solutions to Put Your Content Where the Users Are.”

While there was nothing new or startlingly different about the content of the meeting, for me, at least, I think it was a worthwhile trip overall. The best part of the whole workshop was attending Dan Chudnov’s presentation on “COinS, unAPI, and a Plan for Zero Configuration Service Discovery.” Dan is a great speaker; humorous yet thorough, with an ability to easily explain some pretty technical stuff in a way that most people can understand. I was not surprised to see that he uses a Mac (way to go Mac lovers!) and I liked his use of Keynote for his presentation. The transition theme he used seemed to bother a few people and one person loudly remarked with a sneer, “Looks like a Mac application.” (Get a life, Windows lovers.) What I particularly liked about the approach Dan took with his talk was that he made it Lego-like, that is, piece built upon piece built upon piece, until he reached the (pardon the pun) piece-de-resistance, zero configuration service discovery. His vision for making things completely simple for users, with no configuration necessary for them and no need for them to know about the technical magic that lies behind the user experience, is truly invigorating. The basic focus he had was on using OpenURL and combining it with several other “off-the-shelf” standards to make it dead easy for users to navigate to resources they need. One of the technologies he highlighted was Apple’s excellent Bonjour application for auto-discovery of networked resources such as websites or printers. He also brought up the example of Apple’s iTunes and how it easily allows users on the same network to discover and then play shared music libraries. Overall, this was a great presentation and I am very thankful we have someone of Dan’s caliber to push the technological boundaries in our profession. I wanted to introduce myself to him but didn’t get to do that before the end of the meeting.

Andrew Pace of the Technically Speaking column in American Libraries and author of the Hectic Pace blog, was also in attendance and it was the first time I had seen him in person and heard his by now well-travelled talk about what NCSU has done with its Endeca-powered online catalog. Andrew also is an engaging speaker. I didn’t learn much that I didn’t already know about the work he and others have done but it was interesting to have it presented in person anyway. I wish that I could have spoken with him and others there about the work I am involved in regarding integration of my library’s online catalog with another commercial search engine, work that I think might be interesting to others because it makes new uses of library data that are different than what I have heard is being done anywhere else.

A third highlight of the event was a presentation from someone from the National Academies Press who talked about the challenges and changes they have implemented in providing improved resource discovery for materials they publish. Michael Jon Jensen gave the presentation and he is their Director of Web Communications for the National Academies and Director of Publishing Technologies for National Academies Press. Under his direction this entity has done some really interesting experimentation and development of ways to improve access to the 3,600 books they publish, including development of their own clustering results. One of the things he said that most stood out to me was that National Academies Press provides their books for free in HTML form but they charge for PDF versions. The reason for charging for PDF is that, as he put it, our society still values and treasures the framework and “ethos” of the printed book. Those aren’t his exact words but I think it captures the idea he put forward. He said that a printed book is worth more than the individual pieces, it is bigger and better as a whole collection contained in one package. I thought this to be a very interesting perspective that has important ramifications for how we present and deliver information in an increasingly e-only world.

Jane Burke, former CEO at Endeavor and someone with whom I have always gotten along, was also there as a presenter and it was nice to chat with her for a while and to hear how she is doing in her job leading Serials Solutions.

Finally what made the trip special was the chance to catch up with old friends, Janet Lee-Smeltzer and Tom Wilson. Janet works at UMBC and Tom worked until recently at University of Maryland, College Park. Each night they picked me up from my hotel and we had dinner together and talked far into the evening about librarianship, Web/Library 2.0, library politics, and many other topics.

NASIG newsletter transformed into blog

This is old news by now but I wanted to briefly mention that NASIG has transformed its Newsletter into a blog. Very nice! I can remember the days when the Newsletter was a print only publication, one of the few that I read front to back. Then for many years there was a choice given to members to discontinue receipt of the print version in lieu of an online version in HTML (and later, in PDF as well). A few years ago the decision was made to drop the print version altogether and the Newsletter became an online only publication (available in HTML and PDF). Now, with the introduction of the blog version, the Newsletter has taken yet another step forward. My hat is off to those who made this decision, because I think it makes sense and it also allows me, an RSS addict, to readily be alerted via my news aggregator when a new issue is available. It is entirely appropriate that an organization developed by, for, and about serialists should lead the way when it comes to innovative publishing.

Several big blog changes

Just a quick post to mention several changes I’ve made to this blog overnight. I’ve added a link to a tag cloud in the sidebar. I’ve also added new custom icons in the sidebar for “RSS Subscribe” and “Email Subscribe.” The “Email Subscribe” button’s link replaces the link to an external service, Bloglet, with a way for you to get updates on new posts via email directly. It’s a much nicer service, I think. You can register as a subscriber to Family Man Librarian and with that capability comes several options, e.g. to choose to receive email updates in plain text or HTML. Then I added a new custom icon in the sidebar for “Email Me” so that if you want to contact me directly, you can click on this icon and fill out a web form to send me an email.

Let’s see…What else? Well, I’ve also added a link to my tag cloud in the sidebar, and also added a new section in the sidebar for “Most Popular Posts.” You will also see changes in the content of each post. I’ve added a “Related Posts” portion to the bottom. I’ve also stripped out the categories that used to appear at the top of each post, as well as the list of technorati tags at the bottom of each post, mainly for aesthetic reasons (they just made things too cluttered.)

I think the tag cloud link is particularly cool, as is the ability now to see the most popular posts.

LibraryThing and RSS/HTML feeds

I was happy to see an announcement today from Tim Spalding, creator/maintainer of LibraryThing, about the availability of RSS/HTML feeds. Tim’s work in developing a library community centered around a shared online catalog of user’s books is one of the standout ideas/creations of the past year. He is very responsive to user input and more than that, is able to grasp and see bigger uses for this new kind of service. He is constantly upgrading and adding in new features. I use this service and think that “regular” libraries can learn a lot from LibraryThing’s development. The only negative I can see at this point to his work is a small, nitpicky dislike of the inability to normalize or remove initial articles from book titles for searching purposes, so that, for example, “A celebration of London: Walks around the capital” sorts alphabetically with As rather than Cs.

Tagyu, an automated tagging assistant [Updated]

Tagyu :: Your tags, smarter

Getting back once again to the topic of tags and tagging, about which my own personal jury is still out, I thought it worthwhile to mention a new web service called Tagyu that offers an automated method for picking out relevant tags for the content that you want to tag.? I’ve played around with it for a few days now.? Sometimes it provides some good suggestions, but the majority of the time it chokes.? It particularly does not like hypertext linking syntax as part of the text chunk you copy and paste into its form.? I would think it would be smart enough to anticipate this.

[10/27/2005 -- Many thanks to Adam Kalsey, creator of Tagyu, for so quickly fixing the HTML problem with copying and pasting text into the Tagyu form!]

Updated website

Just finished a minor tweaking/refreshing of my personal website to make it look more akin to this blog. I’ve actually been blogging since 2002 but didn’t start using blogging software ’til the last few weeks. One of the important links on my personal website is a link to my previous blog, which was just static html and which I called my web diary.