Project Gutenberg's Anabasis

By: Sam Vaknin, Ph.D.

Also published by United Press International (UPI)

Malignant Self Love - Buy the Book - Click HERE!!!

Relationships with Abusive Narcissists - Buy the e-Books - Click HERE!!!

READ THIS: Scroll down to review a complete list of the articles - Click on the blue-coloured text!
Bookmark this Page - and SHARE IT with Others!

Go Back to "Digital Content on the Web" Home Page!

January 5, 2004

Last October, Project Gutenberg (PG) - the Web's first and largest online library of free electronic books - released a long-awaited DVD containing close to 10,000 of its titles. Since then, another 1000 texts were added to its burgeoning archives. The Project spawned numerous other Web sites. Some of them - such as Blackmask - offer free downloads and sell their own DVD with mostly Project Gutenberg eBooks in multiple formats. Others provide free browsers and library applications specific to PG's content.

The man behind the Project - and, thus, the inventor of the ebook in 1971 - is Michael Hart.

Always available to preach the gospel of free content and its benefits, he responded to UPI's questions, joined by Greg Newby, Chief Executive of the Project Gutenberg Literary Archive Foundation.

Q. In October 2003, you set a new target for Project Gutenberg of one million free ebooks by the year 2015. Are there so many books in the public domain? And what then?

Michael: Archimedes said, "give me a lever long enough, and I will move the world." Project Gutenberg ( is just such a lever, enabling a single person to create something of immense value that is made available to millions of people. If we have reached a mere 1.5% of the world's population, we have already given away a trillion eBooks.

Project Gutenberg is a grass roots operation, never having had real funding or grants. For 30 years people said that we won't be around next year. When we started to get close to 10,000 eBooks, they finally stopped.

There are lots of pretend eBook operations, but none of them produce all of their eBooks themselves, or have 10,000 of their own eBooks that can be read by virtually any text reader and word processor

The next big step, after we have reached a million eBooks, will be to translate each of them into as many as 100 languages, thus making them available to an even larger audience.

Regarding the number of titles in the public domain, during the 20th Century, there were many years in which over 50,000 books were published and the rate has been increasing throughout. Certainly there were a million titles published before 1923 that we can get our hands on, not to mention non-book items such as newspapers, magazines, brochures and advertisements, court records and other government documents, unpublished manuscripts and diaries, music, film, photographs, audio, and other art forms.

Greg: My calculation, based on the US Library of Congress' copyright renewal records, is that there are about 1 million books published from 1923 - 1964 that are demonstrably in the public domain.  We are seeking to "discover" these items.  The copyrights of only 10% of all published items are ever renewed.

Q. Libraries on CD-ROMs are at least a decade old. Why did Project Gutenberg wait until now to issue its own DVD?

Michael: Because there was always someone out there willing to do it for us. Because CD burners and DVD burners finally got so cost effective that we could afford to give away this kind of media. Because today you can't buy a computer off the shelf without a DVD drive. Until now, physical media could not compete on a cost effective basis with Internet downloads.

Greg: We have some volunteers willing to create CD and DVD images and we now distribute them. But we hope to find many other channels to distribute our content for free or for a small fee.

Q. Why don't simple scans or raw OCR (optical character recognition) output qualify as ebooks? What is the technological future of ebooks - is it Machine Translation and, if yes, why?

Michael: Book scanning is outsourced half way across the world and the results are shoddy and often cannot be used as input for OCR programs, to create a text file, for instance.

In contrast, once a true eBook is created, it has more value than a paper copy, because it can be copied ad infinitum, sent all over the world, even to a billion readers, and can be the basis for hundreds of new paper and eBook editions, all at virtually no cost.

Moreover, people are not interested in scans. Some Project Gutenberg sites each hand out 10 million eBooks per year - impossible with scanned images or full text eBooks due to their bandwidth-consuming oversize.

The "scanners" want to be the only source for "their" books, even when those books are in the public domain - and are willing to claim copyright on the public domain works of Project Gutenberg in the process. They deny themselves true access to the public.

Our Unlimited Distribution Model calls for everyone to have a library of 10,000 eBooks, stored on a single DVD that costs only $1. People find this appealing. There are perhaps 10,000 volunteers to create our kind of ebooks - against only a few hundred people, all paid, working to create libraries of scans.

Additionally, the huge scan files hold just a single book, are not searchable, cannot be copied, indexed, or cited by off the shelf applications, typos can't be corrected, and are not truly portable due to their size.

Project Gutenberg eBooks can be read in any manner the reader chooses - favorite fonts, margination, number of lines per page can all be modified. The reader becomes his or her own publisher. People with disabilities can use a speech engine to read the texts aloud. The visually challenged can change the font size. This is impossible to do with scans.

With CD burners available for under $15, and DVD burners for $100, with blank media so cheap - the cost of individual books becomes literally "too cheap to meter." And that is the whole point of the Project Gutenberg eBook library.

Greg: EBooks are editable and suitable for creating derivative works. They are not intended to be a depiction of a printed artifact, but a direct means of experiencing the author's writing. Today's best OCR still makes (on average) several errors per page of text, and requires human intervention to handle things like page headings and footnotes.

We plan to make PG's ebooks easily transformable among different digital formats - XML, HTML, PDF, Braille, audiobooks, TeX, RTF and others. Features - such as fonts, or background colors - will be selectable. Machine translation (MT) will be another of these "formats", but it is currently technologically premature and immature.

In cooperation with partner organizations in Europe and elsewhere, we hope to help to develop better MT software. We are supporting a project in Europe to augment MT with human translation, much as today's OCR must be helped by human proofreaders to achieve a low error rate.

Q. How would you suggest to balance the need to protect the intellectual property rights of authors and the need to disseminate knowledge?

Michael: The World Intellectual Property Organization (WIPO), in cahoots with commercial interests, leave no quarter for anyone, and seem to want permanent copyright.

How do you achieve balance with someone who wants it all?

Originally, copyright came about because the Stationers' Guild wanted to entrench their monopoly on the written word after it was shattered by the Gutenberg Press. Similarly, in the United States, every copyright extension has had the same purpose, to destroy the effectiveness of a new publishing technology.

The 1909 Copyright Act destroyed the reprint houses made possible by the new steam and electric presses. The 1976 Copyright Act was enacted merely to stifle the effect of the Xerox machine. The 1998 Copyright Act was a response to the effects of the Internet. When it is difficult to make copies, it is legal because only the rich can do it. As soon as it becomes easy enough for the masses to have copies it is made illegal!

Greg: Publishers and media houses are adept at appropriating the intellectual property rights of authors for their own profits. They are insensitive to the social contract of copyright that should result in the release of items to the public domain after a reasonable period. Life of the author + 70 years is not a reasonable period, neither is 95 or 120 years after the creation of the copyrighted work.

Only a fraction of the items currently under copyright are actually available, from anyone at any price. The only benefit accrues to media producers, who restrict the quantity of available prior materials so that their new material is more likely to be purchased.

Q. The commercial ebook industry is going through a bloodbath. Cracked versions of the newest books are available online. Do you believe that ebooks, by nature, should be free - or is there a place for commercial digital content?

Greg: I favor the development of a commercial eBook industry.  Project Gutenberg should be seen as a benefit to that industry, not an adversary. Similarly, I see commercial eBooks as being able to benefit Project Gutenberg, simply by getting more people to read eBooks.

The industry is a victim of its own incompetence.  They did not suffer from a lack of publicity or advertising, but from a lack of usability, standard formats, and sufficient content. They also adopted a crippling cost model that artificially keeps the price of a new hardcover at $20 or so, and a crippling industry model that necessitates enormous overhead to get their ever-decreasing catalog of items, printed on dead trees, delivered to shopping malls.

Fear of illicit copying (music and video) seems to dominate their thinking.  At the same time, the leading organizations (the Author's Guild, the MPAA and the RIAA) are seeking to reduce the realm of fair use. Had these organizations embraced fair use, and introduced reasonable products at reasonable prices, they would not have needed to worry so much about piracy. 

The failure of the eBook is the failure of the industries behind it, not the failure of the idea or lack of a market.  I think it will take new thinkers, and new companies, to garner success.

Michael: Most of the bloodbath I have seen was among the commercial hardware eBook industry, people who wanted to control the reading habits of their customers, who did not want them to read anything that was not paid for and delivered by same commercial interests. When upgrades turn into downgrades to WIPOut access to public domain eBooks that used to be accessible before - that is a "Bad Thing." 

The beauty, the purpose, of eBooks is to re-create the Gutenberg Press. Books whose replication and dissemination all over the world cost nothing, that require no deforestation, warehousing and shipping, that do not end up in the landfills of the world.

The purpose of eBooks is to create a library anyone can carry, weighing under one ounce per ten thousand volumes on standard writable DVDs, or one ounce per 25,000 books on double sided or double leveled DVDs. One kilo of these newer DVDs can hold 1,000,000 eBooks!

And I plan to have just such double sided DVDs to hand out for the holidays two years from now. . . .

Also Read:

The Future of Electronic Publishing

Revolt of the Scholars

The Idea of Reference

Will Content Ever be Profitable?

The Disintermediation of Content

The Internet and the Library

The Future of the Book

Free Online Scholarship - Interview with Peter Suber

Copyright Notice

This material is copyrighted. Free, unrestricted use is allowed on a non commercial basis.
The author's name and a link to this Website must be incorporated in any reproduction of the material for any use and by any means.

The Internet Cycle

The Internet - A Medium or a Message?

The Solow Paradox

The Internet in Countries in Transition

The Revolt of the Poor - Intellectual Property Rights

How to Write a Business Plan

Decision Support Systems

The Demise of the Dinosaur PTTs

The Professions of the Future

Knowledge and Power

(Articles are added periodically)

Visit my other sites:

World in Conflict and Transition

The Exporter's Pocketbook

Portfolio Management Theory and Technical Analysis Lecture Notes

Microeconomics and Macroeconomics Lecture Notes

Malignant Self Love - Narcissism Revisited

Philosophical Musings

Poetry of Healing and Abuse: My Poems

FREE - Read New Short Fiction (Hebrew)

Feel free to E-Mail the author at
or at