United States or Tanzania ? Vote for the TOP Country of the Week !
Project Gutenberg sends a free CD or DVD to anyone who asks for it, and people are encouraged to make copies for a friend, a library or a school. A new DVD is in preparation. In 2004, Project Gutenberg was in touch with a European project studying how to combine translation software and human translators, somewhat as OCR software is now combined with the work of proofreaders.
The use of scanned books as is converted to text format by OCR software with no proofreading gives a much lower quality result. After running OCR software, the text is 99% reliable, in the best of cases. For this reason, Project Gutenberg's perspective is rather different from that of the Internet Archive. In its Text Archive, books are scanned and "OCRized", but they are not proofread.
Digitization is done by scanning the book page after page to get "image" files. There is an average of 10 mistakes per page for a good OCR package, and many more mistakes if the quality of the scanner and the OCR package is not great. The book is proofread twice on the computer screen by two different people, who make any corrections necessary.
Word Of The Day