tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Help Improving OCR of a document

Open adrianlfns opened this issue 2 years ago • 1 comments

Hello,

I'm using your wrapper of tesseract OCR in order to extract text from a PDF. I uploaded here a sample of the PDF that I'm attempting to perform OCR. I also uploaded the trained data that I'm using. If you attempt to perform OCR to that Image, you will see a really bad quality text. Is there anything I could do to improve the OCR output??

thank you in advance

TrainedData.zip

BadOCR_Image

adrianlfns avatar Mar 20 '23 14:03 adrianlfns

The example appears to be very low resolution scan. Tesseract doesn't perform well in these cases the recommended resolution is approx 300dpi. Good luck

On Tue, 21 Mar 2023, 01:55 Adrian, @.***> wrote:

Hello,

I'm using your wrapper of tesseract OCR in order to extract text from a PDF. I uploaded here a sample of the PDF that I'm attempting to perform OCR. I also uploaded the trained data that I'm using. If you attempt to perform OCR to that Image, you will see a really bad quality text. Is there anything I could do to improve the OCR output??

thank you in advance

TrainedData.zip https://github.com/charlesw/tesseract/files/11019529/TrainedData.zip

[image: BadOCR_Image] https://user-images.githubusercontent.com/7875120/226378033-b9a8e18e-08d3-4e86-84fa-d9d88b498566.JPG

— Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7HSBFBZBO5PN2ZYUA2I3W5BVVXANCNFSM6AAAAAAWBGM6B4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

charlesw avatar Mar 21 '23 19:03 charlesw