tessdata_best
tessdata_best copied to clipboard

→

Metadata

Best (most accurate) trained LSTM models.

Reame
Issues

Results 32 tessdata_best issues

Sort by recently updated

trafficstars

jpn.traineddata often interprets large もs as るs

Happens a lot in general. Here's an isolated test: ./tesseract.exe temptest.png stdout -l jpn --psm 6 Warning. Invalid resolution 0 dpi. Using 300 instead. る ![temptest](https://user-images.githubusercontent.com/585488/34409479-febe35f0-eb97-11e7-9ce7-096aa615f1b4.png)

Feedback Latin.traineddata

4

comment

All Errors are in comparison to the old Latin.traineddata: https://github.com/tesseract-ocr/tessdata/issues/82 Compiled 13.09.2017 tesseract Windows 10 x86 --- Recognition Error A -> Ä ![1454419179964_2](https://user-images.githubusercontent.com/30631253/30548780-57e42274-9c93-11e7-868f-bad2bb3e3c36.jpg) [1454419179964_2.txt.txt](https://github.com/tesseract-ocr/tessdata_best/files/1311117/1454419179964_2.txt.txt)

Eng traindata and tight text

1

comment

There is an issue when you write "I'd" really close together and when you end your sentence with " '. " also really tight. ![example1](https://user-images.githubusercontent.com/1424084/31120816-25062e2a-a836-11e7-9c4c-d3e6304210dd.png) ![example2](https://user-images.githubusercontent.com/1424084/31120814-24b86bd6-a836-11e7-95ce-5a6abad666cd.png)

How does Tesseract OCR fuse results from multiple models?

1

comment

In the latest Tesseract, the `-l` option allows for, say, `eng+chi_sim`. How does Tesseract OCR fuse results from multiple models? Can someone point to the code that does the fusion?...

Still getting extra spacing in between characters for jpn

Even after the, "Fix extra intra-word spacing for Japanese" commit, I'm still getting extra spaces in between words for `jpn.traineddata`. `jpn_vert.traineddata` works fine though.

Poor results combining scripts

Please, tolerate my completely newbie question. Trying to process multiple scripts, e.g. `for i in 01-?.png; do tesseract "$i" "text-$i" -l script/Latin+script/Hebrew+script/Greek; done;` `for i in 01-?.png; do tesseract "$i"...

best and fast should have the same config file

Also, both should match` langdata_lstm`s config file. See https://github.com/tesseract-ocr/tesseract/issues/4031

Hebrew david font support

Please train also on Hebrew David font also David bold,. it is widely used in Israel. not only on Arial. David is the common serif font in Israel. not times

`chi_tra` does not recognize vertical text as the doc says

Hi! I am uploading tons of old books in Traditional Chinese to the Internet Archive. And I am trying to find a set of proper cli options so that these...

Works mostly well for the pure Hindi text, but does NOT parse English words at all and misses out on a few Numbers

1

comment

I've used the Hindi dataset. It works mostly well for the pure Hindi text, but does NOT parse English words at all and misses out on a few Numbers. >...

‹
1
2
3
4
›

About

Best (most accurate) trained LSTM models.

1.1k

Stars

363

Forks

Watchers

Owner

← Metadata

1.1k

Stars

363

Forks

Watchers

Owner

Metadata

Best (most accurate) trained LSTM models.