The Other Deep Web

Ashley and I went up to the California International Antiquarian Book Fair today. It was truly a humbling experience in many ways. While a large number of the cracked and creaking volumes in the fair were of little interest (there seemed to be an overabundance of antique fishing books for some reason), there were a few astounding gems.

Of particular interest were the original scientific volumes. A first edition copy of Newton’s “Opticks“. Treatises by Galileo, Copernicus, and Descartes. An original copy of Einstein’s publication introducing the theory of general relativity, and another introducing the photo-electric effect. A first edition of the King James Bible. A first edition copy of Ray Bradbury’s “Fahrenheit 451” (a limited edition bound in asbestos). But why do these books matter? We know the words, the ideas, and the illustrations. What’s left to captivate us? I pondered this while watching a woman nearly suffer an emotional implosion while examining some obscure volume of philosophy that obviously held some powerful sway over her.

I guess in some way, we all look to try to get closer to the original author of some book that meant something to us. To try to get inside them. Maybe if we can reach back in time far enough through these books, we think, we can actually touch their authors’ greatness (and maybe some of what they had will rub off on us). While it starts innocently enough – a first edition here, a copy signed by the author there, an original marked up copy of the manuscript, etc. – but sometimes it gets truly weird.

At some point today the obsessive nature of the antiquarian book trade became readily apparent when I found one vendor selling Robert Louis Stevenson’s matriculation card from the University of Edinburgh. It’s one thing to like the guy’s books, it’s entirely another to want to own the card attesting to his status as a university graduate. That’s kicking it up a notch. In another area, I found a copy of Mark Twain’s “The Celebrated Jumping Frog” originally owned by Theodore Roosevelt, along with a personal letter from Roosevelt attesting to how much it meant to him. How ironic.

Internet geeks have been talking for years about the “deep web” – the dark matter of the Internet universe that remains hidden from the pervasive prying eyes of voyeur search engines. While the term is often applied to data curtained behind corporate firewalls, today the term took on a new meaning for me. Today, it referred to all the ancient “obsolete” knowledge trapped in ancient volumes that would never be visited by Googlebot, or scanned by Amazon. Today, it referred to the collective emotion of the human race for the books and authors they love. There’s no indexing that.

Next Gen Open Source?

Like a few other geeks, I’ve lately been reading Paul Graham’s excellent Hackers and Painters. Most insightful, especially in light of the success of Google and Flickr, is Paul’s views on the advantages of “weblications” over traditional desktop software. Recently, I started wondering about how the transition to web-based services would ultimately affect the Open Source movement.

I’m not alone in this line of thinking, of course. I rarely am. There was a discussion recently on the nature of Google’s contribution to Open Source, in light of an accusation by Krzysztof Kowalczyk that Google (and any other large web-based service) was essentially bleeding the Open Source community for cheap code, and giving little back in return. While Adam Bosworth’s response highlighted the value that people got for free from Google every day, I think he did evade the central thrust of Kowalczyk’s argument, namely that an individual company’s incentives prevented it from contributing back to the Open Source community except when absolutely necessary.

It does illustrate the main hole created by ambiguity in the licenses used to protect Open Source or even “Open Culture” works. Because these licenses are ambiguous about the meaning of “derivative” works, it opens a gap through which companies can fit in order to carve out a commercial enterprise (and I’m not saying that’s necessarily a bad thing). For example, a company can build a web-based service on top of Linux without having to contribute back to the project or release their code under the GPL. This is because creating their web-service didn’t require the company to actually create a “derivative work” of the Linux kernel, only to build an application on top of the operating system. A similar hole was highlighted by the flap sparked by the Trademark blog a few weeks ago when Martin Schwimmer expressed concern that Bloglines was violating his site’s Creative Commons license by aggregating the blog’s RSS feed.

The problem lies in the fact that in a world where the boundaries between applications grows ever more fuzzy, the original intent of Open Source licenses, such as the GPL, is undermined. The original intent of the GPL was to ensure that those who benefitted from an Open Source project where equally obligated to make a contribution back into the community to foster continued innovation and improvement. With that feedback loop broken, companies building web-based businesses will continue to do the only smart thing they can do: exploit the hole, and segment their applications such that they are only obligated to contribute back the minimal amount possible to those Open Source projects to which they make modifications.

On the other hand, I’m not sure we necessarily want to close this hole – doing so may only defeat the rapid adoption of Open Source technology, and prolong the profits that companies can wring out of customers when there is no other viable alternative. In the end, it may be more beneficial for the Open Source community to let these companies subvert this hole in the feedback loop, if only to use it as a clever loss leader to ensure continued rapid adoption, create developer loyalty, and, ultimately, garner protection from those companies not using Open Source who wish to use patent infringement lawsuits or other means to eliminate it as a competitive threat.