28 May 2008

Google Book Search bibliography from Charles Bailey

via read20-l:

Charles Bailey's Google Book Search Bibliography, Version 2 is now
available from Digital Scholarship.

http://www.digital-scholarship.org/gbsb/gbsb.htm

This bibliography presents selected English-language
articles and other works that are useful in understanding
Google Book Search. It primarily focuses on the evolution of
Google Book Search and the legal, library, and social issues
associated with it. Where possible, links are provided to
works that are freely available on the Internet, including
e-prints in disciplinary archives and institutional
repositories. Note that e-prints and published articles may
not be identical.

--

scanners ahoy!

07 May 2008

how to structure a URL for every page of every book

My colleague Bill Tozier has been doing a bunch of digitization work for Distributed Proofreaders, and as a part of that we had some discussions of how you might create some infrastructure to let you build hyperlinks to individual scans of individual pages in particular books.

The observed problem is that if you have a book, and have scanned in a page of that book, there is no easy way to predict what the URL would be to link to that page. Every system (Amazon, Google Books, etc) has its own way of doing things, and none of them have any sort of predictable REST style URL structure for deep linking.

I can imagine a system which would have page names like

http://everybookeverypage.com/isbn/0123456789/page/6.html
http://everybookeverypage.com/issn/01234567/volume/6/number/3/page/12.xml
http://everybookeverypage.com/librarything/work/3097331/book/5320426/page/12.json
http://everybookeverypage.com/aadl/record/1243670/page/12.jpg

e.g. with a URL parser that referenced a naming system, and within that system had a regular structure for naming the elements, and the system itself allowed either for unique copies (like librarything) or possibly non-unique copies (ISBNs). The name would also encapsulate the format which the item would be returned in, either as an image or as a data structure which would have pointers to (something).

This system wouldn't need to have any data in it - it could just resolve or look things up (as best it could).

14 October 2007

Google Book Search data metadata quality check

Google Book Search list of books printed between 0-100 A.D. At this writing, 2315 works.

I only noticed this because there was a search I did with a date range trying to get very early books that turned up a bunch of books from the year 0019.

Metadata is hard.

I've reported this to Google - the response I got back from Ben was "We're working to make the metadata better across the board."

Technorati Tags:

21 September 2007

Google Scholar: Anurag Acharya interviewed by Barbara Quint at Information Today

Google Scholar in the news, an Anurag Acharya interview:

In its own quiet way, Google Scholar has become a major force in scholarly communication. For many researchers, faculty, and students, it is the first search tool used, challenging the popularity and utility of veteran databases licensed—often at considerable cost—by academic and corporate libraries. Yet announcements about changes in the constantly evolving service seem to occur rarely and with little ballyhoo. For example, did you know that Google Scholar has launched its own digitization project, separate from the high-profile Google Book Search mass digitization? Or what about the new Key Author feature? Or the expansion into non-English languages and non-U.S./Western European content? A conversation with Anurag Acharya, the designer and missionary behind Google Scholar, helped us catch up on the latest developments.

Anurag Acharya old home page at UCSB.

Anurag Acharya interview on Google Librarian Central:

TH: What is your vision for Google Scholar?
AA: I have a simple goal -- or, rather, a simple-to-state goal. I would like Google Scholar to be a place that you can go to find all scholarly literature -- across all areas, all languages, all the way back in time. Of course, this is easy to say and not quite as easy to achieve. I believe it is crucial for researchers everywhere to be able to find research done anywhere. As Vannevar Bush said in his prescient essay "As We May Think" (The Atlantic Monthly, July 1945), "Mendel's concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential."

Rik Belew did a review in 2005 about the quality of Google Scholar

Attempts to understand the consequence of any individual scientist's activity within the long-term trajectory of science is one of the most difficult questions within the philosophy of science. Because scientific publications play such as central role in the modern enterprise of science, bibliometric techniques which measure the ``impact'' of an individual publication as a function of the number of citations it receives from subsequent authors have provided some of the most useful empirical data on this question. Until recently, Thompson/ISI has provided the only source of large-scale ``inverted'' bibliographic data of the sort required for impact analysis. In the end of 2004, Google introduced a new service, GoogleScholar, making much of this same data available. Here we analyze 203 publications, collectively cited by more than 4000 other publications. We show surprisingly good agreement between data citation counts provided by the two services. Data quality across the systems is analyzed, and potentially useful complementarities between are considered. The additional robustness offered by multiple sources of such data promises to increase the utility of these measurements as open citation protocols and open access increase their impact on electronic scientific publication practices.

31 July 2007

Espresso Book Machine demo at NYPL's SIBL

pulled straight from PR Web w/o comments:

New York, NY (PRWEB) June 21, 2007 -- The first Espresso Book Machine™ (“the EBM”) was installed and demonstrated today at the New York Public Library’s Science, Industry, and Business Library (SIBL). The patented automatic book making machine will revolutionize publishing by printing and delivering physical books within minutes. The EBM is a product of On Demand Books, LLC (“ODB” - www.ondemandbooks.com), the company founded by legendary publishing executive Jason Epstein and business partner Dane Neller, who joined SIBL’s Kristin McDonough for a private event there to speak about the EBM’s potential impact on the future of reading and publishing.

The Espresso Book Machine will be available to the public at SIBL through August, and will operate Monday- Saturday from 1 p.m. to 5 p.m. The New York Public Library's Science, Industry and Business Library is located at 188 Madison Avenue (at 34th Street).

It makes you wonder a little bit about the cost per delivered book done this way compared to inter-library loan - I don't have all of those numbers in front of me to compare and I suspect it changes from library to library depending on how automated their ILL system is.

Technorati Tags: , , ,

01 December 2006

Jeff Ubois on digital archives and mass digitization on the Scoble Show

This video podcast of an interview by Robert Scoble of Jeff Ubois (mov) has a good discussion of the issues surrounding digital archives.

This is a different kind of show for the ScobleShow, one where we talk about an issue that we should think about — in this case whether companies who are scanning books and other information from the world's libraries and universities are doing us a favor. Jeff Ubois is a co-chair of the Association of Moving Image Archivists' Television Interest Group and is an expert about issues in digital file preservation and archiving.

Read more of Jeff's close following of these issues on his blog archival.tv .

30 August 2006

Planning for library services based on % of patrons with broadband

When I was at the WiLSWorld 2006 conference in Madison, Wisconsin last month, I had an interesting conversation with someone who was looking at planning for library services with a question as to what fraction of their patron base had access to broadband services. This recent Bandwidth Report from Web Site Optimization LLC and Andy King has (some of) those answers:

New Jersey edged out Hawaii for the highest broadband penetration
rate in the US. 48.6% of New Jersey residents enjoy broadband,
well above the national average of 35.1%. Workplace broadband
broke 90% for the first time in July, while video and VOIP traffic
surged.

There is a lot of regional variation, with the low end of the national scale running at about half of the national average, and I'd bet that if you broke it down on a county by county basis there'd be even a wider disparity. I'd recommend the full report for more details to help you plan your use of services that can make use of broadband (downloadable movies, books, etc).

Technorati Tags: ,

25 August 2006

AADL integrates Google Book Search into the catalog

John Blyberg's latest nifty catalog update at the Ann Arbor District Library adds links to Google Book Search when that service has images available for the book. He writes:

So the folks over at Google Books think they can go ahead and incorporate our catalogs into their search, do they?

Actually, that’s fine, I have no problem with that, which means… They should have no problem with me incorporating Google Books into our hit-list. Right?

Now when users search the AADL catalog, they will be given the option to peek inside the books on the hit-list–that is, if there is a record over at Google Books. Basically, the first time that record is displayed in the list, the middleware queries Google Books to see if it has that item in its database. If it does, the middleware makes note of that in a MySQL table so that the remote query doesn’t need to be run again. That way, future queries save time and bandwidth.

Nifty! And handy too, especially since GBS doesn't have subject-based searching.

22 August 2006

Vernor Vinge, "Rainbows End"

One possible future end of mass digitization efforts is but one of the many twists and weaves in Vinge's latest. Library lovers will find plenty to enjoy in this fast paced book where the Geisel library on the UCSD campus is a main character.

Review by Jon Lebkowsky.

Review by Stewart Brand.

Vernor Vinge dedicates his new novel, Rainbows End, "To the Internet-based cognitive tools that are changing our lives -- Wikipedia, Google, eBay, and the others of their kind, now and in the future." The book is an imagining of how those technologies might develop over the next two decades. But publication of Rainbows End is not only a literary event. The question arises, "Will Vinge influence the actual evolution of the technology?" He has done so before.

Review by Vicky Chase, Welles Library (Newingon, CT)

Your thoughts: READ THIS BOOK! It's fantastic. I loved the story but most of all I loved the imagination of the future the author predicts. It's scary. boy I am not sure i would want to be plugged in all the time but it's also interesting how collaboration has increased. The collaborative world he imagines encourages people to share information and to help each other solve problems.

(this should be in the Fiction about Libraries category)

Technorati Tags: , , , , ,

16 August 2006

Manufacturing spimes: building the book as a digital and analog object

A conversation at lunch today with superpatron Bill Tozier about building books that were also digital objects prompted this question.

Let's say for sake of argument that you wanted to print books that were also the delivery vehicle for the source code / original text of the book. What technology would you use, and how would you manufacture the resulting object?

The minimum of this is to print a URL on the book pointing at a web source for the source code, but that's doable now, so the cool factor is low. A slightly more snazzy approach makes that URL machine readable, or embeds it in an RFID device which is queryable, but in any case the information content of the book is low.

A second approach I suppose would be to burn digital media and embed or insert it in the book packaging. The old school way to do this is a CD-ROM, as my prized Storage Mania issue of Mediamatic (1994) predicts:

According to some estimates, by the end of this century the total amount of computer memory in the world will be greater than the total amount of information in the world. The curves will intersect. At this point the demand for storage space will give way to a demand for information.

By that time historians will look back with a smile on our current worries about information overload. Once the data banks are up-to-date they will begin storing information directly at its source. Fresh info will instantly be slurped up by a hungry archive; first come, first served. All intelligent agents will be bought away from their bosses by desperate archivists, to join the rat race for unarchived data.

Any better suggestions welcome (high data rate RFID?)

For more on spimes see Bruce Sterling in Wired, Oct 2004.

(for the "future of the book" category)

Subscribe to Superpatron

What they're saying about Superpatron

  • So you've got Ed exploring the possibility space, and John working to enlarge that space, and together they've created a virtuous cycle of innovation. Now this is obviously an extreme example. You are not going to find a superpatron of Ed's caliber and a superlibrarian of John's caliber in every town. But I think the dynamic at work there can apply more broadly. And if it does, it will matter that these patrons and librarians are situated in a local context. (Jon Udell, Remixing the Library, GRL2020)
  • Der Supernutzer beschreibt 10 Möglichkeiten, der Bibliothek zu helfen....Den wichtigsten Punkt hat er vergessen, ihn aber selbst erfüllt. Sozusagen als Präambel könnte man also anführen:

    “Übe konstruktive Kritik an der Bibliothek. Ohne Resonanz können die Leute da drin nicht wissen, was Du willst.” Infobib.de

  • How come only some books in the Google Book Search have “find in a library” links next to them? Diglet asks, and gets an answer, sort of a lame one if you ask me. update: Kevin mentioned in the comments that it would be great to see this for all books in Google Books. I went to bed thinking “Oh yeah, I should look into that….” and while I was sleeping, Superpatron, aka Ed Vielmetti solved the crime, er problem, and created a Greasemonkey script (a plug-in that you can run with Firefox) that does this for Ann Arbor and can be modified for any library. (Jessamyn West)
  • Curse you Superpatron! t's way past my bedtime, but the Ann Arbor Superpatron has been planting ideas in my head again… (Dave Pattern)
  • Superpatron is a blog run by a patron. The author posts entries about events and articles relevant to the library community, but does it with a patron point of view. (North Texas Regional Library System)
  • The blogosphere's resident "awesomest patron ever," Edward Vielmetti, appears in an article in School Library Journal about how he wrote a script tweaking (ahem, improving) Google Book Search. Vielmetti's blog, Superpatron, is one I read daily and highly recommend to anyone in libraries looking to get a very smart user's perspective. (Librarian In Black)
  • When I wrote him back, I called him the “AADL Super Patron,” which is very coincidental, since he has been planning to create a blog with almost the same name. Today, Superpatron is live and I’m sure it will quickly be filled with Ed’s terrific ideas about making libraries more responsive to patrons’ needs. So hurry up and subscribe already, ok? (Meredith Farkas)
  • The Superpatron (faster than a speeding reference librarian…) posts a presentation on the use of del.icio.us for research. Steven Cohen, Library Stuff
  • I've talked about Edward Vielmetti here before, but I never had the right name for him. Now I do. He's Superpatron! (Jenny Levine)
  • Last fall, in Ann Arbor, Michigan, I gave a talk entitled Superpatrons and Superlibrarians. Joining me for this week’s podcast are the two guys who inspired that talk. The superpatron is Ed Vielmetti, an old Internet hand who likes to mash up the services proviced by the Ann Arbor District Library. That’s possible because superlibrarian John Blyberg, who works at the AADL, has reconfigured his library’s online catalog system, adding RSS feeds and a full-blown API he calls PatREST. (Jon Udell)
  • Little did I know that when I pointed to Ed Vielmetti’s blog, I was not only coining a phrase, but providing the name for Ed’s brilliant new blog. Ed is that (unfortunately still) rare creature that not only groks the net in fullness, but also has use for his public library. (Eli Neiburger)
  • Die Ann Arbor District Library hat einen Nutzer, der sie liebt. Und nicht nur das, er schreibt darüber. Oliver Obst

upcoming.org

Blog powered by TypePad
Member since 08/2003