A brief review of blog traffic for the past year

I don’t pay as much attention to blog traffic for FML as I probably should. I know there are a lot of things I could improve if I paid more attention to the various details. Instead, I tend to look for trends and broad numbers and that’s about it.

This evening I checked summary statistics from Google Analytics for the past year. Here is what I found:

  • There were 6,713 unique visitors to the site, which averages out to about 18.4 visitors per day
  • Visitors tend to spend only about a minute on the site each visit
  • The browser used by visitors breaks down as follows:
    • Internet Explorer – 46.51%
    • Firefox – 41.53%
    • Safari – 9.65%
    • Mozilla – 1.02%
    • Netscape – .48%
  • Traffic sources include 38.36% of visitors who find FML via search engines; 31.68% who go directly to the site (in other words, the site is bookmarked or the URL is typed in directly); and 27.42% of traffic comes from referring sites. Of the 38.36% of visitors who find FML via a search engine, the vast majority of them uses Google (over 80%).
  • The vast majority of visitors uses Windows as their operating system (80.45%). 17.93% use Mac OS X. 1.38% use Linux.

I am especially pleased at the good showing for non-IE browsers. Something else that is of interest is what keywords people use in a search engine that leads them to FML. Here are some of the top keywords, aside from the obvious ones such as “family man librarian”: “portable browsers”, “everyone has a double”, “library related wordpress theme” and “praise you in the storm.”

[tags]blog traffic, google analytics[/tags]

Flock beta version released

Last week (or maybe the week before, I forget), the first public beta release of my favorite web browser, Flock, was released. Naturally I was eager to put it through its paces. I’m glad to say that this is an even better browser than before, with one or two exceptions. In my view Flock has made the social web experience even easier and better because of big improvements in photo website integration (Photobucket and Flickr), blogging capabilities, and RSS.

This isn’t going to be a full blown or scientific review but instead a list of observations, likes and dislikes, etc.:

  • The photo integration is really nice. Now I have the option in the topbar to browse my photos or anyone else’s on a particular topic (tag) if those photos are on Photobucket or Flickr. More than that, I now have the ability to browse these photos in small OR large sizes, and I have easy drag and drop capability to add photos into other applications or a blog entry. For example, just this morning I decided to see what photos folks have posted on Flickr from the American Library Assoication annual conference being held right now in New Orleans. I simply input the tag ‘ala2006′ and was able to quickly call up new and recent photos taken by librarian colleagues. Pretty nice!
  • The blog integration is handled in a better way. Before, I was able to post to my blog from a topbar element. Now, with a simple keystroke (Ctrl+B) I can call up a separate, smaller window and immediately begin blogging. After clicking on the Publish button I am then presented with further choices such as what categories I want to assign and what Technorati tags I want to use. While this whole process took a little getting used to at first (because in the previous iteration, choices for tags and categories were on the main blog posting window) I like this new way of doing things much better.
  • The RSS feed capabilities are nice but they are the weakest feature at this point. I keep getting script errors and/or funky results whenever I try to use the RSS aggregator sidebar. Hopefully this will work itself out soon. When it works, though, the sidebar arrangement and functionality are nice.
  • A big drawback for me for Flock was that there weren’t many native extensions available for it. (You couldn’t just use Firefox extensions, for example, of which there seem to be hundreds.) This is no longer a problem because with this beta release there is now a whole host of extensions available that can be readily used with Flock. I’ve had no problems with the ones I like to use except for FasterFox. It is great now to be able to use the ones I like the most in Firefox.
  • There is a new Conversations topbar plugin available that works much better than the previous Technorati topbar ever did. It’s basically the same as the old Technorati topbar but seemingly reengineered and renamed. I find this a very useful feature when I want to have some sense of what others might be saying about a particular website I’m interested in. When used in combination with the Google Web Comments plugin, I feel like I am able to get a pretty comprehensive sense of the “conversations” that are going on about that website.
  • The del.icio.us integration is also much smoother than before.
  • A really big, important new feature in this beta release is the Quick Search functionality, which integrates several areas into one truly quick search, such as your favorites, your web history, the top five hits from Yahoo!, and a quick way to pick other search engines to search in as well as whatever default search engine you’ve chosen. Again it takes a little getting used to but I am quite impressed with how it works thus far.

I am still surprised that there doesn’t seem to be that much use of or experimentation with this browser among librarian colleagues. Maybe there is stuff going on and I don’t realize it. I’ve used Flock (even the alpha releases) as my default browser for many months now and I have no problem recommending it to anyone. When the students in my course this summer saw me using it and talking about it, some of them decided to try it out, too. One of them found a thorough review on ExtremeTech and posted about it to the class blog.

I also should point out that I use different flavors of Flock. On my Windows laptop from work, I installed Flock on my portable USB drive and it works great. On my PowerBook at home, it also works great.

So…bottom line: If you blog, use photo sharing sites, or just appreciate a functional web browser, try Flock. I think you’ll like it.

EndUser 2006 notes on opening session [Updated]

[Through a series of missteps that I won't go into here, I discovered that I had accidentally deleted this post, first published a few weeks ago. I feel pretty dumb. When I figured out what happened, I sat here, stunned, wondering what to do. Then I remembered Google's good 'ol caching capability, did a quick search to call up the cached version of this post, did a quick copy and paste, and voila, problem solved. Well, almost. My error wiped out the original post entirely, meaning that it automatically broke the link to that post, as well. There's nothing I can do about that. In the process of reconstituting the content, I decided on some editorial tweaks throughout.]

(Warning, this is a pretty lengthy post.)

Yesterday was the start of EndUser 2006, Endeavor’s customer conference. Somewhere around 1,000 customers have shown up for this event, some coming from as far away as Australia, New Zealand, several European countries, as well as Canada, Latin America, and of course, the U.S. As I’ve noted before, there are several conference sessions dealing with topics of interest, but yesterday’s highlight was the opening general session featuring a representative from Google who spoke in depth about Google’s Book Search project. Tom Turvey, Head, Google Book Search Partnerships, gave a brief over of Google and how it makes money, defined the elements of Google Book Search, described the Google Book Search Partner Program (which he oversees), and finally discussed the Library Program portion of Google Book Search. Tom has a long history of working with online content, serving in numerous roles in the publishing industry relating to online delivery, including launching Barnes & Noble’s ebook offerings and most recently holding a senior post at HarperCollins.

Tom began by describing Google’s business. He mentioned that Google now provides 59% of all Internet search referrals. Google’s oft-repeated mission is “to organize the world’s information and make it universally accessible and useful.” Their Its core business, i.e. how they the company makes money, is from advertising revenue generated via paid search ads using Google AdSense. Tom also mentioned that Google is the leader, by far, in referrals to book sites (currently it processes about 60% of all such referrals). In describing Google’s business, Tom pointed out some interesting statistics about book purchasing. He provided statistics showing that 13% Thirteen percent of all book purchases are now done online; schools/libraries make up about 24% of the book buying market, direct to consumer purchasing (direct from publishers) is about 2%; and the biggest growth area recently has been in non bookstore retail (books being purchased in Costco, Sam’s Club, Wal-Mart, etc.).

The next portion of the presentation focused on an explanation of Google Book Search. Tom pointed out that in his experience, never has there been so much misinformation about a product as there has been with Google Book Search (GBS). He made some comment that 90% of what has been published in the news media is false, thus the importance of explaining exactly what it’s about. GBS, at its heart, is an attempt to associate book content with what searchers are looking for in search engines. There are two main parts to GBS: the Partner Program, and the Library Program. The Partner Program involves relationships and agreements between Google and publishers. GBS launched in October 2004 at the Frankfort Book Fair. As of now there are literally thousands of publisher partners spanning seven languages. One of the most frequent questions publishers ask Google is, what books are good choices for discovery via GBS? One of Tom’s funnier statements was “we don’t need to help Harry Potter find an audience.” What Google is mostly interested in is the arcane, the obscure, and bringing this material to light via searching GBS. Every page is searchable; users are searching books from cover to cover. There are two ways of providing search on book content: a dedicated search (books.google.com), and integrating book content within the general Google search. The main intent of working with publishers is to drive book sales. Content is protected in a variety of ways (Tom mentioned that as you can imagine, this element of agreements with publishers often gets “into the weeds”). Only 20% of a book is viewable by one user during the course of a month. Print, copy, and save are disabled. Scanned images are purposely low resolution. Publishers can add/remote remove their material at any time. There is page level security as well. A percentage of pages is never visible at one time. Google’s process for receiving publisher content is pretty straightforward: the publisher usually sends either a PDF or a print copy. If the latter, Google digitizes it. As an interesting aside to closing out this portion of the talk, Tom mentioned “Oh by the way, the five publishers who are suing Google over the Library Project are actually members of the Partner Program.”

In turning to the third and last portion of the presentation, Tom outlined the elements of the Library Project. Partner libraries, as most people are aware by now, include Stanford, NYPL, Oxford, Michigan, and Harvard. In researching and comparing collections from each partner library, Google discovered that 60% of books are held in only one of the partner libraries. For legal and other issues, Google began the project by focusing on public domain books. However, public domain books make up only about 20% of a typical library collection. Ten percent of a typical collection is made up of books that are still in print (i.e. the stuff that is handled via the Partner Program). Most books, 90%, are in print but in a fuzzy area in which they may be out of print but still in copyright, or perhaps out of copyright. Seventy percent of collections were published after 1923 and fall into three categories: in copyright, in public domain, or the rights may have reverted. Obviously Google needed to figure out how to solve or address these complexities. Their solution was to offer to scan everything but provide three views: sample pages (partner view), snippet view (book under copyright w/out agreement with a publisher partner), and full book view (book is in public domain). The snippet view means that the full text of each book is indexed; users can only view three snippets from the book; there are links to “buy this book” as well as “find in a library”; different categories of books are handled in different ways; and copyright holders may opt out of display and/or scanning.

Obviously a critical factor for Google is optimizing and streamlining the workflow. For example, a key consideration was figuring out how long it takes to scan a typical book. Tom mentioned that in the early days of the project, founder Larry Brin and another staff member would use a metronome to time each other over and over again as they tried to figure out how best to scan a book. (Why a metronome? I have no idea and neither did Tom.) Books are scanned as is, including scribbles, marginalia, notes, whatever. Google is aiming to build a comprehensive collection of indexed books but has a long way to go yet on achieving that goal. Some of the challenges they face on a daily basis are 100% OCR accuracy, 100% image quality, search and integration with web search, the accuracy of any affiliated metadata, the existence of lots of “edge cases” in terms of how to process and display the scanned results, how to address books that contain multiple languages and/or scripts; and how best to achieve a good level of speed/automation of the entire process. As with their much vaunted (and top secret) search algorithms, Google is constantly tweaking the process to try to improve the quality. How do they handle math formulas, spelling correction (Tom used the example of vernacular language that is meant to be spelled a certain way but which looks wrong to a typical spell checker), etc.? What is the best way to deal with automated metadata extraction? Can they figure out an automated way to detect (and appropriately handle) different languages and/or scripts?

Tom made a big point of the fact that Google is actively engaging the library community. Librarians tell Google the good and the bad about GBS (e.g. of bad: too overwhelming for users, hard to know which stuff is authoritative and what is junk, desire to know exactly how the process for scanning and indexing works). Google wants to ensure that GBS works for libraries by making information more discoverable, driving more library usage, and supporting a worldwide community, which is especially relevant for remote and distributed library users. Google has no desire whatsoever to put libraries out of business; in fact, Tom claims that the opposite is true.

[One of the things that I thought was particularly striking was that at one point during the session, Mr. Turvey asked for a show of hands from the audience of those people who were aware of the facts and details he had provided about Google Book Search. To my astonishment, I was one of the few people to raise their hands. Maybe this was just due to some people not fully understanding the question or to some people's innate shyness, who knows. But if it was an indicator of professional ignorance of these matters, then we're in big trouble.]

After concluding his prepared remarks, Tom invited the audience to pose questions. This was perhaps the most interesting portion of the session and Tom handled the questions with aplomb and a dose of wit. Below are my notes of the substance of some of the questions posed, followed by the substance of what I could jot down of Tom’s answers.

Question: When a user sees a link to “find in a library” which leads to Open WorldCat, what librarians want is to have that user come to us rather than use Google and/or buy the book from the publisher. What is your view on this?
Answer: It appears that this is in fact what is happening. Logs show that adding the “find in a library” link, directed to Open WorldCat, has driven a tremendous growth in traffic to WorldCat. Presumably this leads to higher library use.

Question: I’d like to see much more powerful search options, including things like truncation, proximity searching, and boolean capabilities. Is this something Google is considering?
Answer: That’s a very good question, what I’d expect from a librarian <laughter from the audience>. Some of these capabilities are things we are indeed working on, while some of them are already available via the Advanced Search option.

Question: I believe that in search results from publisher content, there is no link to “find in a library” when there is such a link provided in the library search. Why is that?
Answer: Good question. Remember that the goal of GBS is to have a relevant search. The vast majority of books available in GBS at this time are from publishers. Over the next few years, that proportion will flip to emphasize library-owned material. Honestly there is a constant tug and pull between publishers and Google over this issue of how to direct users. Publishers, obviously, participate in GBS to sell more books.

Question: Is there any plan to include Library of Congress Subject Headings (LCSH) as part of the GBS search?
Answer: LCSH and other taxonomies are already used to some extent behind the scenes to assist with determining relevance as well as identifying relationships between books (linking from one book to a related book).

Question: Can you speak about why you are being sued by some of your publisher partners?
Answer: Attorneys love it when you talk publicly about their litigation <much laughter from audience>. Seriously, though, no, I can’t answer that.

Question: Are you indexing each book cover to cover (i.e. full text)? How do you determine relevancy? [Editorial aside: Was this person paying attention? This question was clearly answered in the context of the presentation.]
Answer: Yes, we are doing full text. The ranking/relevancy algorithms used in GBS are pretty much the same as those used in the regular Google search. Some tweaking is of course necessary to make the algorithms relevant for book search. We do user interface testing every month and as a result, we constantly tweak/change the algorithms.

Question: Do you have a formal digital preservation strategy?
Answer: We have agreements with our library partners that cover preservation to whatever degree they have specified in their legal agreements. It really depends on what partner libraries want. Other than that, no, we do not have a formal preservation strategy and do not feel that that is a role we should assume.

Question: Elaborate on how relevant metadata is in GBS.
Answer: Well, first of all, metadata does play a role in GBS but our bias is always toward full text, with metadata/abstracts thought of as secondary. This is probably the opposite of how most libraries would prioritize things.

Question: I have a question on the issue of fair use. Are you working to expand the concept of fair use in terms of scholarly material in particular?
Answer: We feel that our stance on fair use and GBS is very, very significant. We do not have any formal focus on scholarly material in GBS, though.

Question: What is Google’s stance toward the Open Content Alliance? Does Google view them as partners, or competitors?
Answer: We have an open door, a desire to partner and share in digitizing material. We believe that initiatives such as the Open Content Alliance are worthy of our support. However, as you can imagine, there are certain complexities and a lot of politics involved in this kind of interaction. We want to participate in initiatives like this in as open a way as possible.

Question: “Find in a library” links only to WorldCat at present. Does Google have any plans for directing traffic to other bibliographic (i.e. library) databases (this is particularly important for those libraries who aren’t linked from WorldCat)?
Answer: We’d be interested in any other worthwhile bibliographic databases, but WorldCat is it for now.

Question: A single search box is very attractive, but when you expand your data sources (as Google is doing), the simplicity and relevance of this one search become more difficult to maintain. How do you handle this?
Answer: We constantly reevaluate the one box concept and it is an ongoing problem to solve. There is no ready answer.

Question: How do you handle materials from publishers once those materials have gone out of print?
Answer: Good question. Once a publisher’s book goes out of print, they request that it be removed from the index and then it no longer appears in the search. The exception to this would be if there happens to be a copy of that same book that has been scanned and indexed as part of the Library Project. In that case, the book would remain in the index.

Question: Do you have plans for providing regional Google book searches (e.g. one for New Zealand imprints)? This is important for those outside of the U.S. because currently there is such a predominance of U.S. imprints in GBS.
Answer: We already do this, e.g. currently we have 65 regional book searches.

Question: The exposure from GBS for libraries is great, but it needs to be more two way, e.g. to direct users looking for material in a local library catalog to GBS and/or elsewhere. Are there any plans to extend the Google API to be used by libraries for integration into their online catalogs?
Answer: Something like this functionality is present in Google Scholar. We are very happy with this integration with library services and we want to figure out ways to extend this further.

Question: What’s your view on library’s development of customized Greasemonkey scripts to integrate library results in with GBS?
Answer: Anything that doesn’t violate copyright, we’re all for.

Question: GBS is very exciting. What about developing Google Journals?
Answer: <tongue in cheek> …So we have this thing called Google Scholar…Actually we are working ways to better integrate or link between GBS and Google Scholar.

Question: There is clearly a balance of power issue relating to the premise that allowing Google to do all this scanning and digitizing of book content puts the burden of proof on the content creator rather than the user. What are your thoughts about this?
Answer: We believe that this is a very important issue and our stance on this hinges on the belief that we are simply being consistent between the indexing of website content and indexing the content of books.

Question: What about working to include government documents, because they do no present a copyright problem?
Answer: Yes, we have a team devoted to this very issue. It is a bigger challenge to do this than it may at first appear because in order to do it we need to work out who is responsible (i.e. the publisher) of the multitude of gov docs. Expect progress on this front.

Comments are marginalized in the blogosphere

Something that I’ve noticed for quite a while, and given some thought to, is that blog comments seem to be the most marginalized element of the blogosphere. This, in spite of the fact that in many cases, a comment may be even more useful or valuable than the original posting on which it is based. Of course most up to date blogging software platforms not only provide commenting capability but also allow you to present a separate RSS feed for the comments. Some blogs, such as the LITA Blog, take this a step further and combine postings and associated comments into a single, integrated RSS feed. I wish I could figure out how to do that with this blog because I like that approach. Why then do I think comments are marginalized? Well, because I suspect most people, like me, either do not subscribe to a separate comment feed or don’t know that one exists for a particular blog, so a lot of the discussion on an interesting topic is missed. Some blogs offer the ability to subscribe to a feed just for comments on one particular posting. But let’s be honest, how many of us are willing to add umpteen RSS feed subscriptions to our news aggregator in the hopes of keeping up with what might be an interesting conversation? Even worse, comments do not seem to be readily accessible via search engines, including blog search engines such as Google’s Blog Search or technorati. Furthermore, while blog posts tend to have several choices for tagging as ways to help navigate or find related postings, comments simply muddle along with the vast majority of them having no such capability. (I think I have seen the ability to tag comments on one or two sites, but I could just be imagining it.)

What can or should be done? Well, blog comments need to stand up for their rights, for one thing. They need to advertise their existence more (be readily searchable in or exposed to search engines). They need to organize (implement tagging or related technologies to enhance findability and navigation). They need to find new and easier ways to get their message across without the clutter of trackbacks.

The last paragraph in particular is written a bit tongue in cheek. However…Is this a non issue? Has it already been addressed? Am I making a mountain out of a mole hill (as my mother often said)? Please comment.

Mixing work and blogging

This article in the Chicago Tribune today was interesting to read, although it didn’t cover any new ground in the debate about employee blogging. Now that I work for a commercial entity again, this is an issue that I am more aware of. As far as I know there is no official blogging policy at my company, although a broader policy about Internet use could be construed to cover it. It is related to my previous post in that the author mentions the accessibility via Google and other search engines of whatever people choose to write in their blogs, and how easy it is to get into trouble if something negative is written about an employer. It also mentions that concerns by employers about their employees blogging are very similar to concerns employers had back when universal access to email and the World Wide Web for employees was a new thing.

I have had firsthand experience with this concern on the part of an employer. When I was at the University of Chicago I was asked to chair a task force in their library’s technical services division that was charged with articulating a set of guidelines for acceptable use of the Internet on the part of librarians and staff. The formulation of this task force was largely prompted by concern by some supervisors in technical services about abuse of Internet and email on the part of their staff. The task force duly arrived at a set of guidelines but frankly, they were not taken seriously and, at least for the duration of my time there, were not enforced that I know of. My personal view of the situation as a manager of several staff was (and still is) to rely upon principles of common sense and good supervision rather than an artificial set of rules or guidelines. I remember likening the potential for abuse of email to abuse of using the telephone. There is nothing new in this, really. And blogging is similar. If I as a supervisor have concrete, well understood expectations for performance by my employees, along with concrete ways of measuring that performance, the issue of email/blogging/telephone/Internet abuse can be easily dealt with. For instance, I made clear to my staff that I really didn’t care if they used the Internet for personal things IF (and that is an important point) their performance was good. That is, if they were getting their assigned work done in a superior fashion then using the Internet for surfing or writing emails or whatever was just fine with me. However, I also made it clear that if performance was subpar then personal use of the Internet would be one of the key areas I would focus on for that staff member, and I would restrict or curtail that activity if it was shown to be a contributing factor to their negative job performance. Unfortunately, my experience has shown that there are many supervisors in libraries (and maybe elsewhere) who lack common sense and/or people leadership skills and who turn to artificial rules and regulations to do their work for them.

Grokker, a new type of web search engine

Some time ago, I read about Grokker, a new type of web search engine that presents results visually in cluster maps (think something like Venn diagrams), rather than in a long series of search results to page through one by one. Put in simple terms, the idea behind Grokker is to enable the searcher to more readily find the desired information that might be buried in web pages on the umpteenth page of search results from a standard search engine. More recently, I downloaded a free 30-day trial version for Mac OS X (they also offer a Windows version) and used it to find relevant information on a particular topic in Google that I was struggling to find using the regular Google interface. I am quite impressed with it, although I am not sure yet whether or not I want to fork over the $49 they charge for a production version of the software. If you want to see the future of search engines, or at least one model for that future, I suggest you download a copy yourself and play around with it. It takes a bit of getting used to, but I think you’ll like it.