tesseract
tesseract copied to clipboard
Help Improving OCR of a document
Hello,
I'm using your wrapper of tesseract OCR in order to extract text from a PDF. I uploaded here a sample of the PDF that I'm attempting to perform OCR. I also uploaded the trained data that I'm using. If you attempt to perform OCR to that Image, you will see a really bad quality text. Is there anything I could do to improve the OCR output??
thank you in advance
The example appears to be very low resolution scan. Tesseract doesn't perform well in these cases the recommended resolution is approx 300dpi. Good luck
On Tue, 21 Mar 2023, 01:55 Adrian, @.***> wrote:
Hello,
I'm using your wrapper of tesseract OCR in order to extract text from a PDF. I uploaded here a sample of the PDF that I'm attempting to perform OCR. I also uploaded the trained data that I'm using. If you attempt to perform OCR to that Image, you will see a really bad quality text. Is there anything I could do to improve the OCR output??
thank you in advance
TrainedData.zip https://github.com/charlesw/tesseract/files/11019529/TrainedData.zip
[image: BadOCR_Image] https://user-images.githubusercontent.com/7875120/226378033-b9a8e18e-08d3-4e86-84fa-d9d88b498566.JPG
— Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7HSBFBZBO5PN2ZYUA2I3W5BVVXANCNFSM6AAAAAAWBGM6B4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>