brianstorms weblog: January 2006 Archives

January 28, 2006

Google Denounced

Posted by brian at 07:04 PM | Comments (0)

January 16, 2006

Firetrucks

Nothing like being woken up at 4:40 in the morning by ever-closing-in firetruck sirens. It's normal to hear them some evenings, as they go by, sirens wailing but diminishing with the distance. But it's not normal to hear the sirens at a constant rate of volume, then get louder, then slowly fade, then get louder . . . and louder . . . and LOUDER . . . and LOUDER STILL until you hear the firetruck's engines, and bakes, and you hear the tires rolling down the hill on YOUR OWN STREET and the sirens are by this time BLARING INCREDIBLY LOUD and that's when you bolt out of bed and run to the window, with the white, red, blue, and yellow lights flashing insanely in all directions.

Yes, there's nothing like waking out of a deep sleep to see a FIRE ENGINE in your driveway, its sirens wailing, its lights flashing, and the firemen inside yelling at each other.

They were lost.

They didn't know where the fire was.

Turns out there WAS a fire. And a house burned down.

Here's the story in the San Diego Union-Tribune.

Posted by brian at 07:51 PM | Comments (0)

January 15, 2006

Lessig is Wrong

There's a lawsuit under way with the Authors Guild and the American Association of Publishers suing Google Inc for massive copyright infringement.

Oh, the echoes of the RIAA v MP3.com lawsuit (full disclosure: I worked at MP3.com during that time period, and was part of the team that worked on the my.mp3.com service which was the basis for the RIAA's suit).

Lawrence Lessig is arguing that Google Book Search qualifies as "fair use" as defined by 17 USC 107.

The authors and publishers say, no way. They say that Google is blatantly ripping off millions of books without first getting permission from (read: paying) the rights holders.

Given the way the RIAA v MP3.com case went, I side with the authors and the publishers, and I think Lessig is wrong.

In his nifty little video, Lessig makes two main points. I question the validity of both of them. Let's start with the first one.

1. The Kelly v Arriba Soft Corp Case
Lessig argues we should look at the judgement of this case when considering how the Google case should be decided. In Kelly v Arriba, Arriba was sued for copyright infringement for an image search service similar to Google Images. For details on the case, you could start here.

Lessig shows a screen shot from Google Images:

and has this to say around 16 minutes into his video: "You can see that in this picture there are thumbnail versions of images that exist on other websites, and if you clicked one of these thumbnail images, you would be taken to that other website, and given the opportunity to see that full image on that other website. There was no effort to copy the original image and store that either under the Arriba Soft case or in the Google case, instead this is simply a way to grant an index into images that are available elsewhere. Now, the court said, what happened in this case is that the original copyrighted work had been reduced, so as to link to an image, a thumbnail image, linked to the original image. So there was a transformation of the original copyrighted work by transforming it into a thumbnail. That thumbnail itself wasn't a substitution for the original copyrighted work, the quality of a thumbnail is so poor, no-one would use that instead of the original copyrighted work. And all that the transformative work did was give you access to the original copyrighted work, in a way that protected and advanced the interest of the underlying copyright owner. And that's what grounded the claim for fair use, which the Arribasoft case ultimately held, protected the practice in that particular case.

"And so too, could you argue, is what Google doing protected by fair use," says Lessig. "Because what Google is doing here, is also producing a transformation of the original copyrighted work. That transformation produces a reduced 'image' in some sense, of the original copyrighted work, so again, here are the snippets:"

and Lessig shows this image in his video when he says "snippets":

He goes on to say this: ". . . here are the snippets that Google displays as a way to give you access to the original copyrighted work, once you see this reduced quality image, then you have the opportunity to link back to the original copyrighted work in a way that drives you to that original work. The difference of course is that the work here is not on the computer, because we're indexing books in physical space, whereas the works in the arribasoft case were on a computer somewhere, you could see the original work on the computer, but the principle that unites the two cases is that the transformation produces a reduced version of the original copyrighted work, itself not a substitute for the original work, but instead providing an advanced index or access to the original copyrighted work in a way that actually adds value back to the original copyrighted work. That principle in both cases drives the conclusion that there should be "fair use" in both cases. So this link to books is just like the link to images in the Arriba Soft case."

And he shows an equation to nail home his point.

So. Why do I disagree with this argument.

I disagree because Google is not making thumbnails of books. Sure, I agree that Google is "transforming" the original copyrighted work. But it sure as hell isn't doing it the same way that Arriba Soft saves thumbnail images of the original copyrighted images, and provides links to the websites where the original copyrighted images exist. (This latter part is flawed as well, as we'll see later.)

In Google's case, they are clearly scanning in the original copyrighted pages of these books, no doubt using fairly high resolution digital photography, and then it seems they are using some sort of optical character recognition to transform the printed text on the page into a digital text on the computer. On Google's computer. I have no idea if Google is OCRing and then running an indexer on the resulting body of text and then tossing the body of text and keeping the index. I bet they're not. I bet they're keeping the whole enchilada. I mean, look at what Google's doing. Or must be doing: how would Google know where the words "Pioneer Life" appear in a bitmap image of the scanned book, unless they had scanned the book's entirety, kept all the bitmap images, and recorded where the printed words in the image match up to the text words in the digitized OCR'd body of text? That's got to be some pretty fancy technology to do that.

Remember, Lessig said this above, "That thumbnail itself wasn't a substitution for the original copyrighted work, the quality of a thumbnail is so poor, no-one would use that instead of the original copyrighted work."

Ok, so one attribute of a thumbnail is poor quality, right? What happens if the quality is nearly as good as, if not BETTER, than the original? If you use advanced digital copying machines to photograph every single page of a book, including, ironically, its copyright page, and FAITHFULLY retain the original font, layout, and pagination of the book, you have to agree, it sure as hell is not a "thumbnail", or "reduced" copy of the book. Far from it. If there's any TRANSFORMATION going on with Google and books, it's extracting the information from the book and storing it in its entirety in not just one, but multiple formats!

All without permission of the author, the publisher, or other rights holder.

But wait, one more thing before moving on to Lessig's argument number two. Lessig talks about the wonders of Google's "snippets" -- see, they're only "portions", they're clips! Fair use! Fair use! Well, I don't think so. The "snippets" are clearly artificial. Lessig likes to use a 1924 copyrighted book called True Stories of Pioneer Life to make his case. He shows the little snippets in the image above. But look closely at that screen shot. It says "136 references to "pioneer life" in the book.

How would Google know that? It would know that because it has the whole book digitized, both photographically and textually! Do some other searches in this book. For instance search for references to the word "corn". It finds 14 in the book, and shows you three snippets. Search for "evening". It finds 16 references, and shows 3 snippets as teasers. Search for "indian". 9 references, 3 snippets. And so on. Google knows this book. It knows it intimately. And it is able to show you snippets that are photographic portions of the actual printed pages of the original copyrighted work, with the word reference highlighted right in the images itself. Again, amazing technology, but forget the geekitude for a second. This is not reductive transformation. It's expansive transformation. Google is taking someone else's work product and without their permission, copying it in multiple formats and letting people search on it. I don't deny it's handy. I just deny Google should be able to do it for free.

And one more thing. Those "Buy this Book" links on Google Book Search results pages. Who gets paid when Google sends the user to Abebooks or Alibris? Or Froogle?

2. 17 USC 107
Lessig's second argument relies on his interpretation of 17 USC 107, the section of copyright law pertaining to the concept of "fair use". Lessig reduces the text to a simple 1,2,3,4 image, summarizing the law as having to do with work, use, amount, and markets.

Well, let's look at the actual law:

§ 107. Limitations on exclusive rights: Fair use Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

Lessig focuses on his number four, "markets". Convenient, since I think a strong case can be made that what Google is doing is flat-out commercial in nature (1); that the nature of the copyrighted work is that it's a printed book and was intended to be just that, bought by a reader or library (2); and that it is outrageously clear that when it comes to the "amount and substantiality of the portion" of the books being copied, Google is exceeding the spirit of what the "fair use" principle expresses because, I mean come on, admit it man!, Google is copying the whole document, and saving not only the visual PHOTOGRAPHIC copy of it, but a digital TEXT copy too! This is way, WAY beyond "transformation" or "reduction".

But Lessig focuses on his number four, "markets". He argues that the author and book publishing "market" has failed to invent something and so therefore don't deserve to enjoy whatever benefits Google derives from its hard work. Er, I mean, whatever Google derives from its copying of all those millions of authors' years of hard work. Or something.

Lessig says this (at around 29 minutes 24 seconds into the video): "So here's the choice the law presents us with. If we recognize the right of the authors or the publishers to demand that Google get permission before Google has the right to index and make accessible that index to our past, then the authors and the publishers may get a few pennies for each work that's within their collection, but at least 50 percent of our past will be lost. 50% can't be within this index because there's no author or publisher to ask. 50% will be invisible to the world that looks to our past through the lens of digital technologies. That's the choice they've forced us to make, and that's the choice that the law of fair use must reckon when it decides whether this use by Google is ultimately 'fair use' under the law of copyright."

Now, as is his wont, he shows slides of words he's speaking in the narration. When he says the word "pennies" we see this in his video (at 29 minutes 44 seconds):

Now, when I saw this image, I thought, gee, where did Google-fan-boy get this image from? If you look very closely, it has a copyright watermark at the top. Check it out:

"Copyright @ andipantz.com". Hmm, I thought. I wonder where he got this. Given Larry's Google fandom, I figured it might be worth going to Google and doing a search for "pennies".

Let's do that, shall we?

And then click on "images". To look at all those fair-use thumbnails of pennies, of course.

Well, well, well, what do we have here. Click on the image, and we get:

Andrea K. Gingerich's blog is what we get. And her copyrighted photo that she's entitled "Counting My Pennies". It's dated August 30, 2004. To be clear, the "Copyright (c) andipantz.com" is watermarked on Gingerich's blog -- this is not a watermark that Lessig added. In fact, as far as I could tell, ALL of Gingerich's images on her blog have that watermark added somewhere in the image.

I wonder if Andrea Gingerich gave Professor Lessig permission to use that photograph in his widely-distributed Bittorrented movie?

Apparently she did not. At least, Andrea Gingerich tells me she did not. She's in Costa Rica at the moment, but still checking email. :-) See, I emailed her and asked. In fact, she thanked me for bringing it to her attention, and said she was going to fire off "a nice email" to Lessig directly.

But I digress.

Let's get back to the points Lessig makes about market failure legitimizing Google's activity. Well, wait. One thing. Remember, he said this: "If we recognize the right of the authors or the publishers to demand that Google get permission before Google has the right to index and make accessible that index to our past, then the authors and the publishers may get a few pennies for each work that's within their collection, but at least 50 percent of our past will be lost."

Our past. Our past. WHOSE past, Google-boy? If someone writes a book, copyrights it, publishes it, sells it to the marketplace, how does that make it part of your or my past? Seems to me, copyright is copyright. You got a beef with that, track down the copyright holder. Oh, it's impossible? Oh, it's a pain. Too bad. It just takes a lot of time. But someone somewhere is the copyright holder of that out-of-print book you so wish to copy. Maybe Google Search will help you find the rights holder. Stranger things have happened.

Remember, item (4) of 17 USC 107 says, "the effect of the use upon the potential market for or value of the copyrighted work." Lessig argues that Google's Book Search project will only have positive benefits for the potential market for or value of the original copyrighted work.

That may be, but I think the reality is, GOOGLE ITSELF is going to be the party that enjoys positive benefits of dusting off and exposing the original copyrighted work to a wider audience. Google, I would argue, is the clear winner.

Who loses? Always a useful question to ask. The author loses. They don't collect Lessig's "pennies". Maybe that's all they get, but dammit, they should get 'em! The publisher loses. Maybe it's long out of business, maybe it isn't. It still loses. Someone's graves are spinning while Google's reaping all the rewards.

You know who else loses? We all do. And not in the way Lessig thinks we do. We all lose because what Google is doing is hastening the demise of libraries. Hastening the demise of books. Hastening the requirement that you must have a computer, and a network connection, to get access to books.

Face it: books are toast. Google Book Search is the toaster.

What's Next?
So let's see. If Google wins this case, can we expect to see Google Movie Search, and Google Music Search next? After all, how else am I going to be able to find long-out-of-print copies of that obscure 1930 movie that nobody's seen in 75 years? Or that obscure 78 record made in 1941 that nobody's heard since, well, 1941?

If Google wants to rescue the cultural output of the world and make it searchable and retrievable, I am all for it. I just want Google to do it right. Pay for it. Spend your billions to track down every rights holder, and negotiate a deal. With every damn rights holder. One by one. It may take 200 years. So be it.

UPDATE -- 18 Jan 2006
I heard from Andrea Gingerich today and she informs me that she emailed Larry Lessig about his use of her "Pennies" photograph. She says he wrote back and apologized for using it and said he'd remove it from the video.

Posted by brian at 02:39 PM | Comments (21)

January 05, 2006

Brianstorms, Retrievr'd

So everyone's talking about Retrievr, a new "search by sketch" service that helps you find images within Flickr.

Supposedly.

I decided to search for the letters to the word "brianstorms". Hey, if it came back with something cool, I'd use it as the logo! Well, maybe.

So I set about sketching in all the letters:

I sketched in "b". I sketched in "r". I sketched in "i". "a". "n". "s". "t". "o". "m". Here's a composite of all the letters I sketched:

Each time I then grabbed the first image that Retrievr offered as a match. I reasoned that "first" was the leftmost image in the top row in the list of image results.

So, here's the new "brianstorms" logo, after it's been Retrievr'd.

Click above for full-size version

I dunno. You tell me. Which is better, that, or the original logo?

Nice try, but I think I'll stick with my version. :-)

Posted by brian at 07:34 AM | Comments (2)

January 04, 2006

Vongo: Choose When? Watch When?

Vongo, the new site for downloading feature films on demand (NYT story here), doesn't work for Mac users.

Talk about poorly-worded messages. It's certainly not an "OS Failure". It's a Vongo failure. Not the best way of winning over Mac users.

Posted by brian at 11:01 AM | Comments (0)