tessdata_best icon indicating copy to clipboard operation
tessdata_best copied to clipboard

Best (most accurate) trained LSTM models.

Results 32 tessdata_best issues
Sort by recently updated
recently updated
newest added
trafficstars

Happens a lot in general. Here's an isolated test: ./tesseract.exe temptest.png stdout -l jpn --psm 6 Warning. Invalid resolution 0 dpi. Using 300 instead. る ![temptest](https://user-images.githubusercontent.com/585488/34409479-febe35f0-eb97-11e7-9ce7-096aa615f1b4.png)

All Errors are in comparison to the old Latin.traineddata: https://github.com/tesseract-ocr/tessdata/issues/82 Compiled 13.09.2017 tesseract Windows 10 x86 --- Recognition Error A -> Ä ![1454419179964_2](https://user-images.githubusercontent.com/30631253/30548780-57e42274-9c93-11e7-868f-bad2bb3e3c36.jpg) [1454419179964_2.txt.txt](https://github.com/tesseract-ocr/tessdata_best/files/1311117/1454419179964_2.txt.txt)

There is an issue when you write "I'd" really close together and when you end your sentence with " '. " also really tight. ![example1](https://user-images.githubusercontent.com/1424084/31120816-25062e2a-a836-11e7-9c4c-d3e6304210dd.png) ![example2](https://user-images.githubusercontent.com/1424084/31120814-24b86bd6-a836-11e7-95ce-5a6abad666cd.png)

In the latest Tesseract, the `-l` option allows for, say, `eng+chi_sim`. How does Tesseract OCR fuse results from multiple models? Can someone point to the code that does the fusion?...

Even after the, "Fix extra intra-word spacing for Japanese" commit, I'm still getting extra spaces in between words for `jpn.traineddata`. `jpn_vert.traineddata` works fine though.

Please, tolerate my completely newbie question. Trying to process multiple scripts, e.g. `for i in 01-?.png; do tesseract "$i" "text-$i" -l script/Latin+script/Hebrew+script/Greek; done;` `for i in 01-?.png; do tesseract "$i"...

Also, both should match` langdata_lstm`s config file. See https://github.com/tesseract-ocr/tesseract/issues/4031

Please train also on Hebrew David font also David bold,. it is widely used in Israel. not only on Arial. David is the common serif font in Israel. not times

Hi! I am uploading tons of old books in Traditional Chinese to the Internet Archive. And I am trying to find a set of proper cli options so that these...

I've used the Hindi dataset. It works mostly well for the pure Hindi text, but does NOT parse English words at all and misses out on a few Numbers. >...