tessdata_best issues

finetune tesseract on 2 languages

1

I have a document that contain 2 languages when I combine language models for prediction (-l ara+eng) the result is not very good how can I finetune on top of...

mhrihab

unclear/ambiguous language code notation

10

While working on a mapping of bibliographic language codes (ISO 639-2/B due to the RDA application guidelines of the German National Library https://wiki.dnb.de/download/attachments/127172808/Kapitel_6.pdf?version=2&modificationDate=1505213938000&api=v2) to the corresponding (presumably ISO 639-2/T coded?)...

michaelkubina

enhancement

convert eng training to h5 model

3

how to export a Keras model of English language? is it possible to export the corpus to do some neural network training using it? I mean something like MNIST dataset

ehrenmann1977

enhancement

Ara language is showing "empty page!!" on one laptop and gives the answer on another

4

Downloaded ara_best language to test on tesseract and it showed "empty page !!" tesseract version: tesseract 5.0.1 leptonica-1.79.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff...

larawehbe

Improved Tamil and Sinhala traineddata

4

Improved by training on custom fonts. Tesseract Version: 4.1.1 Trained Fonts ---------------------------------- Tamil ====================================== Step 01: "TAMu_Kadambri" "TAMu_Kalyani" "TAMu_Maduram" "Lohit Tamil" "Droid Sans Tamil Bold" "Droid Sans Tamil" "Karla Tamil...

Chaarangan

Portuguese trained data fails to recognize @

3

Hi, I'm using tesseract to perform OCR in a document that contains an email address. Using the eng trained data, it recognizes the email address correctly (but naturally fails in...

jgsmarques

bug

Segment error (core dumped)

2

MollyHoo

question

old russian / church slavonic glyphs?

13

Is it possible to add support for the Old Russian / Church Slavonic glyphs, at least for the 'yat' (U+0462, U+0463), 'fita' (U+0472, U+0473), and 'izhitsa' (U+0474,U+0475) ?

yurytch

enhancement

Tesseract can not detect @ sign in Turkish language

1

**Environments** - Tesseract Version: tesseract v4.0.0.20190314 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0 -...

bagsergen

bug

Performance post Fine-Tuning

Used ara.traineddata from tessdata_best directory and finetuned it using approximately 300 arabic numerical data. Though the accuracy on arabic numerals has increased, but the tuned model performs drastically bad than...

rraina97

tessdata_best
tessdata_best copied to clipboard

Metadata

finetune tesseract on 2 languages

unclear/ambiguous language code notation

convert eng training to h5 model

Ara language is showing "empty page!!" on one laptop and gives the answer on another

Improved Tamil and Sinhala traineddata

Portuguese trained data fails to recognize @

Segment error (core dumped)

old russian / church slavonic glyphs?

Tesseract can not detect @ sign in Turkish language

Performance post Fine-Tuning

← Metadata

Owner

Metadata

tessdata_best tessdata_best copied to clipboard

Metadata

← Metadata

Owner

Metadata

tessdata_best
tessdata_best copied to clipboard