tesseract Number recognition in B/W paper with mechanical elements produce lot of false result

Number recognition in B/W paper with mechanical elements produce lot of false result

Open LordDispater opened this issue 3 years ago • 1 comments

Hello, i'm try to identify some number in a page (i need numbers and positions), i try to set up different set up but cannot achieve better result. My best use is to process with "image_to_data" because already identify "12" instead of "1"+"2". I will have to use this to identify numbers position in image up to 99 numbers in a single image. To improve recognition i have try:

Image processing COLOR_BGR2GRAY,GaussianBlur,adaptiveThreshold,dilate,erode
Different configuration --oem/--psm
Scaling the image (around 1.2x seems a little better, worst on 2x, worst on reduction of scale)
Recognition off all text (not only number) will produce usually worst result

The only thing that i cannot acheive is to train a different .traindata (cannot found example porgrammed in phyton) because i will have always the same character font to be recognised, usually with very close pixel dimension.

Environment

Tesseract Version: 4:00, (tested also with 5:00 but .traindata from 4:00)
Platform: Windows 10, Tesseract build 32bit, Test programmed in Phyton (Pycharm 2020.2.3)

Current Behavior:

Using configuration:

cong = r' --oem 3 --psm 6 -c tessedit_char_whitelist=0123456789'
boxes=pytesseract.image_to_boxes(img,lang='eng',config=cong)

Res1

Using configuration:

cong2 = r'--oem 0 --psm 6 outputbase digits -c tessedit_char_whitelist=0123456789 '
result=pytesseract.image_to_data(img2,config=cong2)

Res2

Original image:

Expected Behavior:

Found only the numbers in the image and not false identification in mechanical pieces

Suggested Fix:

Nov 11 '20 08:11 LordDispater

Hi LordDispater,

I know you asked your question on november 2020. But I would like to know if you found a good configuration with tesseract and maybe OpenCV to get a good result reading an image with schemas and numbers ? I'm working on something similar and can't get a good output. If you have any clue, it would be great

Thanks in advance. Fanny

Jan 12 '22 12:01 fannya76

tesseract tesseract copied to clipboard

Number recognition in B/W paper with mechanical elements produce lot of false result

Environment

Current Behavior:

Expected Behavior:

Suggested Fix:

tesseract
tesseract copied to clipboard