Looking through this week’s set of readings the uses and abuses of metadata and tagging, I was reminded of my longstanding discomfort with the nature of some academic critiques of new technologies. Google’s Book Search: A Disaster for Scholars by Geoffrey Nunberg is an illustrative example. A well-respected linguist who publishes and speaks frequently in the mass media, Nunberg bemoans the primitive state of metadata across the books that make up Google’s massive as a “train wreck” for scholarship.
To be certain, Nunberg performs a valuable service in communicating to a broad audience some real limitations to the technology. He does a great job presenting his findings about systematic errors in the metadata provided by Google. He shows the horrendous and hilarious erroneous results provided, presumably, by a Google search by the publication year of 1899. He also points out a large number of misdatings of popular and important books. In fact, according to Nunberg, his unscientific sample produced an error rate on publication date of 70%. He goes on to reveal an array of undeniably silly classification errors in classic texts: ‘Moby Dick’ labeled Computers! ‘Jane Eyre’ labeled Architecture! For digital historians and archivists, it’s a powerful demonstration of the importance of good metadata.
Nunberg also does a good job pinpointing a major source of these problems: the machine-based scanning and data harvesting system used by Google. The problem is that this is the one and only explanation he offers with any substance – perhaps not entirely unrelated to its coming directly from the mouth of Google employees – and his prescriptive vision suffers as a result. In the place of serious analysis, he resorts to speculations about Google’s lack of cultural sophistication tinged with more than a little elitism. “I have the sense that a lot of the initial problems are due to Google’s slightly clueless fumbling as it tried master a domain that turned out to be a lot more complex than the company first realized,” he writes, perhaps imagining that nobody at Google has been exposed to literature. He triumphantly accuses Google of making a wrong choice in classification systems when they went with BISAC, a classification system often used by commercial retailers, as their standard. After briefly ruminating about a potential usefulness when it comes to ad placements, he sniffs with indignation that this system underwhelmingly distorts the weight of important books because “Bambi and Bullwinkle get a full shelf to themselves, while Leopardi, Schiller, and Verlaine have to scrunch together in the single subheading reserved for Poetry/Continental European.” It’s not difficult to imagine Nunberg looking down his nose as he says, “Google has taken a group of the world’s great research collections and returned them in the form of a suburban-mall bookstore.”
In the end, we are left with a couple of limp possibilities. Nunberg has hope that organizations like the Internet Archive or a consortium of libraries called HathiTrust may “pick up the slack” in his words. Most importantly, says Nunberg, Google should be motivated to license metadata from the Library of Congress and OCLC out a sense of obligation to demonstrate its claim that Google Books is a “public good” or to avoid becoming “a running scholarly joke.” Even if we overlooked the ethnocentrism of the former idea and the naivete of the latter idea, it’s clear that Nunberg seems to have missed the memo about Google: it’s a company, a very big and very profitable company. It makes money by providing products to the public for “free” and wrapped in the cozy rhetorics of freedom, access, knowledge and fun, all the while making money hand-over-fist by tracking and scanning our every online move such that they can charge advertisers premium prices for ‘targeted ads.’ This is why it’s ironic when Nunberg says condescendingly, “It’s clear that Google designed the system without giving much thought to the need for reliable metadata.” He is very, very correct, but for reasons very different from the ones he imagined. Had he made an effort to look past Google’s promise to build the world a free library, he would have noticed Google magically turning public and private property into another wildly successful profit-generating engine, one that turns a profit day and night, regardless of whether naïve or elitist academics think Google Books is good enough for scholarly research.