tessdata_best Source of scripts/Fraktur etc.

trafficstars

While the files in the top directory seem to come from the sources in the langdata repository, the source for some of the files in scripts/ is unclear:

scripts/Fraktur.traineddata has no matching file in langdata,
scripts/Japanese.traineddata also, etc.

The Data-Files wiki article does not mention scripts/Fraktur.

This adds to the confusion of the frk language (not actually frankish, but Fraktur), the Fraktur script and the legacy model deu_frak in the tessdata repository.

Jun 03 '19 11:06 mikegerber

See

https://github.com/tesseract-ocr/tessdata_fast/blob/master/README.md

https://github.com/tesseract-ocr/langdata_lstm/blob/master/script/Fraktur.langs.txt

Jun 03 '19 11:06 Shreeshrii

Also see https://github.com/tesseract-ocr/tessdata/issues/65

Jun 03 '19 11:06 Shreeshrii

Is langdata obsolete as langdata_lstm exists?

Jun 03 '19 13:06 mikegerber

langdata files are appropriate for tesseract 3 or for legacy/base versions using tesseract 4. They can also be used for finetuning which requires a smaller input training text.

Jun 03 '19 13:06 Shreeshrii

As @Shreeshrii already said, langdata_lstm is for LSTM models while langdata is for legacy models. Both kinds of models are still used.

The scriptmodels are mixtures of different languages. script/Fraktur for example combines enm+frm+frk+ita_old+spa_old.

I fixed the description for 4.00 frk in the Wiki. The other Wiki issues are still open.

Jun 13 '19 17:06 stweil

Fraktur Tesseract OCR is what I am looking for,.... I installed VietOCR v5.5.2 and Tesseract 4.1.0 on my mac, and now I am trying to find help on how to train it better.... there are too many OCR errors...

How would I go about training the software? Can anyone help?

I am a total retard, ...sadly,.... and I do not even know how I was able to install the two components so far..... and this training step is nowhere explained

Any help into the right direction would be greatly appreciated

Oct 01 '19 20:10 Akossimon

In the meantime newer Fraktur models are available. There is a description of the training process for those models in the Wiki.

As soon as the training is finished, I'll add the results to tessdata_contrib.

Nov 11 '19 15:11 stweil

@mikegerber, can we close this issue?

Jan 24 '20 08:01 stweil

tessdata_best tessdata_best copied to clipboard

Source of scripts/Fraktur etc.

tessdata_best
tessdata_best copied to clipboard