Documentation include advice of approach to mass scan similar documents via OCRmyPDF

Open kalpha2 opened this issue 5 years ago • 1 comments

If I was to look at scanning several 100 pages what would be the best approach? In particular is there any benefit in scanning at higher dpi for better OCR accuracy but having lower resolution image stored? Can OCRmyPDF do this already itself or in combination with something?

Apr 29 '20 15:04 kalpha2

Everyone's workflow is different. Some people can't handle any of kind of image reprocessing in case they want to redo OCR in the future.

For best results scan at 300 dpi minimum with deskewing on and do postprocessing in your scanner's application. Use ocrmypdf's optimize feature (-O2) to reduce file sizes.

Higher JPEG compression + high DPI gets more bang for the buck than downsampling, because JPEG compression is "smart downsampling" in a way. The only drawback is larger dpi images still take more memory when decompressed so will load slower when viewed.

Apr 29 '20 22:04 jbarlow83