pdfocr
pdfocr copied to clipboard

→

Metadata

Adds text to PDF files using the cuneiform OCR software

Readme
Issues

pdfocr

pdfocr adds an OCR text layer to scanned PDF files, allowing them to be searched. It currently depends on Ruby 1.8.7 or above, and uses ocropus, cuneiform, or tesseract for performing OCR.

Using

To use, run:

pdfocr -i input.pdf -o output.pdf

For more details, see the manpage.

Dependencies

pdfocr requires tesseract and hocr2pdf. These can be provided by installing the packages tesseract-ocr, tesseract-ocr-eng (or other languages you need), and exactimage from your distribution.

Credits

pdfocr was written by Geza Kovacs

pdfocr is hosted at http://github.com/gkovacs/pdfocr

Christian Pietsch added tesseract support.

About

Adds text to PDF files using the cuneiform OCR software

ruby

ocr

pdf

319

Stars

49

Forks

Watchers

Owner

gkovacs

← Metadata

319

Stars

49

Forks

Watchers

Owner

gkovacs

Metadata

Adds text to PDF files using the cuneiform OCR software

Back

pdfocr pdfocr copied to clipboard

Metadata

pdfocr

Using

Dependencies

Credits

← Metadata

Owner

Metadata

pdfocr
pdfocr copied to clipboard