docspell icon indicating copy to clipboard operation
docspell copied to clipboard

Feature Request: "Try to get the orientation right"

Open lopiuh opened this issue 3 years ago • 2 comments

As far as I know it is a trick to do a OCR 4 times and changing the orientation of the document. The scan with the best OCR (most hits in a dictionary scan?) is the "right" orientation. Paperless-ng seems to do that trick. Is it possible to integrate in docspell?

Thanks

lopiuh

lopiuh avatar Jul 03 '21 18:07 lopiuh

Thanks for reporting. This is a quite expensive trick, but simple and works of course. For reference there is #554 with similar requests. Changing orientation should be done automatically and manually at some point. It is possible to integrate it into docspell; I want to put more thought into this first.

eikek avatar Jul 03 '21 20:07 eikek

Thanks, i did a manual orientation change on a pdf with linux "pdf arranger" and interestingly ocr was not better, maybe it is no real image rotating but a stored info "orientation" which get changed by changing orientation. Scanning the same document in correct orientation gives the correct ocr by the way. maybe there is a flag to tell the used ocr tools to use the saved orientatin information of a pdf (if my reasoning is right)

UPDATE 21-07-07: Testing again with another pdf: OCR was correct done after rotating with software pdf arranger (linux). Don't know what the problem was with the first sample which paperless got ocred but docspell not...

lopiuh avatar Jul 03 '21 20:07 lopiuh