tessdata_best icon indicating copy to clipboard operation
tessdata_best copied to clipboard

Feedback Latin.traineddata

Open TheSeiko opened this issue 8 years ago • 4 comments
trafficstars

All Errors are in comparison to the old Latin.traineddata: https://github.com/tesseract-ocr/tessdata/issues/82

Compiled 13.09.2017 tesseract Windows 10 x86


Recognition Error A -> Ä

1454419179964_2 1454419179964_2.txt.txt

TheSeiko avatar Sep 18 '17 15:09 TheSeiko

Problems with E/F at begin of line Often "E" is recognised as "F"

1454436670744_subimage 1454436670744_subImage.txt

TheSeiko avatar Sep 18 '17 15:09 TheSeiko

small/capital letters

1454455689122_subimage 1454455689122_subImage.txt

TheSeiko avatar Sep 18 '17 15:09 TheSeiko

When a sentence spans multiple lines and the only one or two words are in the last line, the words in last line are often recognised with capital letters. Here: Wird Expected: wird

1511823574752_subimage

TheSeiko avatar Nov 28 '17 11:11 TheSeiko

I just tried all of the examples above with Tesseract 5.3.1 and tessdata_fast/script/Latin. Most of them were recognized perfectly. I only got one Ä/A and one ß/B confusion.

stweil avatar May 17 '23 18:05 stweil