tessdata_best issues

I'm encountered an obvious recognition error when using best eng model

I'd like to upload the errornously recognized figures here But it seems github cannot show it directly, so I've uploaded it to a repositoary. The errornously recognized file can be...

Heermosi

Source of scripts/Fraktur etc.

8

While the files in the top directory seem to come from the sources in the [langdata](https://github.com/tesseract-ocr/langdata) repository, the source for some of the files in `scripts/` is unclear: * `scripts/Fraktur.traineddata`...

mikegerber

Tesseract fails to recognize % sign in Hungarian language texts

8

When the language is set to "hun" (Hungarian), Tesseract is unable to recognize the % sign. This sign is very commonly used in Hungarian to represent percentages, the same way...

Googulator

bug

Fine tuning training for a mix language tessdata

6

If understand correctly, traineddata files that starts with a capital letter are "mixed languages" traineddata (e.g Hebrew = heb+eng). Was is produced by combining "heb" and "eng" traineddata files or...

amitm02

Polytonic characters on Greek trained data files

It looks like that although ell.traineddata should contain only modern Greek characters after OCR-ing multiple tiff files written in modern Greek output text contains (old) polytonic characters eg: "εἶναι "...

kchrs

Wrong text recognition when there is a mix of numbers and characters

1

### Environment Tesseract Version: Tesseract 32 bits version: tesseract v4.0.0-beta.1.20180608 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1...

efollana-sistel

Add kur with Arabic unicharset (similar to old tessdata)

3

See related issue in langdata. https://github.com/tesseract-ocr/langdata/issues/116

Shreeshrii

Telugu unicode ambiguities

5

Hi, I created a test text data mostly (made up individual characters. see attachment) and converted it to tiff file using 'jTessBoxEditorFX' with font 'noto sans telugu 8pt'. I then...

meta-forte

can you recommend the best traineddata for numbers and latin letters

4

I'm using tesseract for reading mrz codes and sometimes it gives me incorrect symbols eg. instead of "I" it gives me "1" or instead of "5" it gives "S"

JenyaKirmizaTripTop

File Error Reporting: The Chi_Tra file is actually a Chi_Tra_vert file

1

mathshk20012001

tessdata_best
tessdata_best copied to clipboard

Metadata

I'm encountered an obvious recognition error when using best eng model

Source of scripts/Fraktur etc.

Tesseract fails to recognize % sign in Hungarian language texts

Fine tuning training for a mix language tessdata

Polytonic characters on Greek trained data files

Wrong text recognition when there is a mix of numbers and characters

Add kur with Arabic unicharset (similar to old tessdata)

Telugu unicode ambiguities

can you recommend the best traineddata for numbers and latin letters

File Error Reporting: The Chi_Tra file is actually a Chi_Tra_vert file

← Metadata

Owner

Metadata

tessdata_best tessdata_best copied to clipboard

Metadata

← Metadata

Owner

Metadata

tessdata_best
tessdata_best copied to clipboard