tessdata_best Wrong text recognition when there is a mix of numbers and characters

Wrong text recognition when there is a mix of numbers and characters

Open efollana-sistel opened this issue 6 years ago • 1 comments

trafficstars

Environment

Tesseract Version:
Tesseract 32 bits version:
tesseract v4.0.0-beta.1.20180608
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

Platform:
Windows 10 32 and 64 bits

Current Behavior:

Tesseract cannot extract text correctly from following image: test9

Output text is following:

180448013 706618 72.150/17 16.01.17 25495

More details:

Image is a 300 dpi resolution.
I used best tessdata configuration files.

Command line:

tesseract.exe "test9.png" "[MyPath]\output" -l spa --psm 3 --oem 3 pdf

Expected Behavior:

Tesseract must extract "18044801J" instead of "180448013". Note: Using data files from https://github.com/tesseract-ocr/tessdata works fine.

Nov 30 '18 09:11 efollana-sistel

@Shreeshrii any update on this?

Dec 13 '18 08:12 efollana-sistel

tessdata_best tessdata_best copied to clipboard

Wrong text recognition when there is a mix of numbers and characters

Environment

Current Behavior:

Output text is following:

More details:

Command line:

Expected Behavior:

tessdata_best
tessdata_best copied to clipboard