ocr
ocr copied to clipboard
Option for text output
Feature request
I'm having a lot of difficulty with the pdf output... I've been scanning some old printouts, and converting them with the OCR App -- the problem is that if I select the text in the pdf and copy it to a document to save it as text, I get results such as:
"Loosentheelbowwheretheflexibleelectricalconduitentersthespa(itisonlypressed into place, "
I get exactly the same results with the command-line tesseract when creating a pdf output, so that's just a tesseract bug...
However, if I run the tesseract command with text output, the result is:
"- Loosen the elbow where the flexible electrical conduit enters the spa (it is only pressed into place,"
which is wonderful. However, I can't find an option to select text as output in the OCR App.
Expected Behavior
A checkbox (or perhaps a popup menu if more output options are possible), to select text rather than pdf output. Might even be nice to have text as the default via a personal preference, but that's just on the wish list... :)
Current Behavior
pdf is the only option.
BTW, I'd be willing to take a crack at a patch if you're really busy, but I think someone familiar with the code=base may be able to add this quickly, and "do it right" :)