tesstrain issues

Maths OCR

Currently tesseract is not at all good at maths equations , Can tesseract be used to use the correct training data and work for maths equations , same as mathpix...

purplecrow2020

fine tuning arabic traineddata to solve extended words issue

2

so i want to finetune ara.traineddata in the traineddata_best repo to handle extended words like the this : ![sample_9](https://github.com/tesseract-ocr/tesstrain/assets/36017867/179090ca-04b4-4402-a484-f2365156a2c1) to do that i made a list of lines with the...

sifdinNh

Training Tesseract OCR for a specific document

4

I have recently started learning and experimenting with Tesseract OCR. I have done a training for a new font using the tesstrain. Now my use case is that I want...

mumarsyal

Ground truth: spaces before and after text?

10

I've created *.exp0.gt.txt as a base for manual ground truth creation using [Shreeshrii's shell script](https://github.com/tesseract-ocr/tesstrain/issues/7#issuecomment-419714852) and the files contain a space before and after the text (no newlines etc). Example:...

jbarth-ubhd

training fail again and again

9

Kindly help me to fix this issue i dont know what went wrong plz guide python3 shuffle.py 0 "data/OCRA/all-lstmf" + head -n 269 data/OCRA/all-lstmf + tail -n 30 data/OCRA/all-lstmf combine_lang_model...

Ham714

bug

urd Language [Bad Results] [Fine Tunning] [Jupyter Notebook]

1

Hi everyone, I'm working on the Urdu language to enhance the accuracy of tesseract. I have used the below code to get the output however, the result was extremely bad....

IrtazaIjaz

[Python] [Pytesseract] [Urdu] [Segmentation fault] [Deserialize header failed]

5

Hi All, I'm having trouble executing the fine-tunning on this repository. Below is my code which I run on my Jupyter notebook: ``` **Step1:** !git clone https://github.com/tesseract-ocr/tesstrain.git Step-2: %cd tesstrain...

IrtazaIjaz

Training with python: run training step ?

1

As mentioned by @stefan6419846 in https://github.com/madmaze/pytesseract/issues/508 , there is a python wrapper for training in [tesstrain/src/](https://github.com/tesseract-ocr/tesstrain/tree/main/src) , which unfortunately is not documented in [tesseract](https://github.com/tesseract-ocr/tesseract), [tessdoc](https://tesseract-ocr.github.io/tessdoc/) and [tesstrain](https://github.com/tesseract-ocr/tesstrain/) repositories. From my...

forzagreen

Normalization failed / Invalid start of grapheme sequence Error While training the tesseract model

1

Normalization failed for string 'ଜୀବନକୁ ନିବିଡ଼ ଭାବେ ଏକନ୍ୱିତ କରିଛନ୍ତି' Invalid start of grapheme sequence:D=0xb71 Normalization failed for string 'ପରମ୍ପରାକୁ ଅବଲମ୍ୱନ କରିଛନ୍ତି, ସେତିକି ମଧ୍ୟ' Invalid start of grapheme sequence:M=0xb48 Normalization failed...

Sanketnarkhede-10

Question: Training seems to work fine, but using traineddata file produces garbage

3

[Archive.zip](https://github.com/tesseract-ocr/tesstrain/files/10204085/Archive.zip) Uploaded my training log and .traineddata files along with a sample image. Log seems to indicate that the model is correctly getting the text, but if I try running...

lzhaxi

stale

tesstrain
tesstrain copied to clipboard

Metadata

Maths OCR

fine tuning arabic traineddata to solve extended words issue

Training Tesseract OCR for a specific document

Ground truth: spaces before and after text?

training fail again and again

urd Language [Bad Results] [Fine Tunning] [Jupyter Notebook]

[Python] [Pytesseract] [Urdu] [Segmentation fault] [Deserialize header failed]

Training with python: run training step ?

Normalization failed / Invalid start of grapheme sequence Error While training the tesseract model

Question: Training seems to work fine, but using traineddata file produces garbage

← Metadata

Owner

Metadata

tesstrain tesstrain copied to clipboard

Metadata

← Metadata

Owner

Metadata

tesstrain
tesstrain copied to clipboard