tessdata_best
tessdata_best copied to clipboard
Wrong text recognition when there is a mix of numbers and characters
trafficstars
Environment
Tesseract Version:
Tesseract 32 bits version:
tesseract v4.0.0-beta.1.20180608
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0
Platform:
Windows 10 32 and 64 bits
Current Behavior:
Tesseract cannot extract text correctly from following image:

Output text is following:
180448013 706618 72.150/17 16.01.17 25495
More details:
Image is a 300 dpi resolution.
I used best tessdata configuration files.
Command line:
tesseract.exe "test9.png" "[MyPath]\output" -l spa --psm 3 --oem 3 pdf
Expected Behavior:
Tesseract must extract "18044801J" instead of "180448013". Note: Using data files from https://github.com/tesseract-ocr/tessdata works fine.
@Shreeshrii any update on this?