United States or Greenland ? Vote for the TOP Country of the Week !


The main formats used are XML, TIF and DjVu. A field "Full Text" was also added as an experimental feature. It will then also be possible to choose the font and size of characters and the background color. Another eagerly expected conversion is that of a book from one language to another by machine translation software.

Project Gutenberg also publishes books in well-known formats like HTML, XML or RTF. There are Unicode files too. But a large scale conversion into other formats is handed over to other organizations. Or Manybooks.net, which converts Project Gutenberg's books into formats readable on PDAs. Or Wattpad, a free service for reading and sharing stories on a mobile phone.

Project Gutenberg is convinced that proofreading by human beings is a very important step, and that this step makes all the difference. The use of scanned books as is converted to text format by OCR software with no proofreading gives a much lower quality result. After running OCR software, the text is 99% reliable, in the best of cases. The main formats used are XML, TIF and DjVu.

In December 2003, there were 11,000 books digizited in several formats, most of them in ASCII, and some of them in HTML or XML. This represented 46,000 files, and 110 G. On 13 February 2004, the day of Michael Hart's presentation at UNESCO, in Paris, there were exactly 11,340 books in 25 languages. In May 2004, the 12,581 books represented 100,000 files in 20 different formats, and 135 gigabytes.