tesseract
tesseract copied to clipboard
Number recognition in B/W paper with mechanical elements produce lot of false result
Hello, i'm try to identify some number in a page (i need numbers and positions), i try to set up different set up but cannot achieve better result. My best use is to process with "image_to_data" because already identify "12" instead of "1"+"2". I will have to use this to identify numbers position in image up to 99 numbers in a single image. To improve recognition i have try:
- Image processing COLOR_BGR2GRAY,GaussianBlur,adaptiveThreshold,dilate,erode
- Different configuration --oem/--psm
- Scaling the image (around 1.2x seems a little better, worst on 2x, worst on reduction of scale)
- Recognition off all text (not only number) will produce usually worst result
The only thing that i cannot acheive is to train a different .traindata (cannot found example porgrammed in phyton) because i will have always the same character font to be recognised, usually with very close pixel dimension.
Environment
- Tesseract Version: 4:00, (tested also with 5:00 but .traindata from 4:00)
- Platform: Windows 10, Tesseract build 32bit, Test programmed in Phyton (Pycharm 2020.2.3)
Current Behavior:
Using configuration:
cong = r' --oem 3 --psm 6 -c tessedit_char_whitelist=0123456789'
boxes=pytesseract.image_to_boxes(img,lang='eng',config=cong)
Using configuration:
cong2 = r'--oem 0 --psm 6 outputbase digits -c tessedit_char_whitelist=0123456789 '
result=pytesseract.image_to_data(img2,config=cong2)
Original image:
Expected Behavior:
Found only the numbers in the image and not false identification in mechanical pieces
Suggested Fix:
Hi LordDispater,
I know you asked your question on november 2020. But I would like to know if you found a good configuration with tesseract and maybe OpenCV to get a good result reading an image with schemas and numbers ? I'm working on something similar and can't get a good output. If you have any clue, it would be great
Thanks in advance. Fanny