Shreeshrii

Results 75 issues of Shreeshrii

The pdf output is not correct for Devanagari script when using the 3.2.3 experimental version for tesseract 4.0.0alpha. Please see attached zip file with input image, text, hocr and pdf...

@stweil You had mentioned at one point that it should be possible to finetune `fast` models. It will be useful to have this feature as many `fast` models use a...

feature request
training
priority: high

While trying to plot the error rates for training, I have come across an anomaly. I use the LOG file generated from messages output during lstmtraining run, which also out...

training

``` tesseract -v tesseract 5.0.0-18-g771c1 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found NEON...

bug

In Devanagari script, a virama is used to kill the inherent vowel of a consonant. When followed by another consonant, it forms a conjunct form. Depending on the font used,...

training
traineddata
encoding failed

``` $ training/lstmtraining --model_output ~/tesstutorial/sanskrit2003_from_full/sanskrit2003 \ > --continue_from ~/tesstutorial/sanskrit2003_from_full/san.lstm \ > --train_listfile ~/tesstutorial/santrain/san.training_files.txt \ > --target_error_rate 0.01 Loaded file /home/shree/tesstutorial/sanskrit2003_from_full/sanskrit2003_checkpoint, unpacking... Successfully restored trainer from /home/shree/tesstutorial/sanskrit2003_from_full/sanskrit2003_checkpoint Loaded 1746/1746 pages (0-1746)...

bug
training
encoding failed

https://github.com/tesseract-ocr/tesseract/issues/648#issuecomment-271987456 >Indic may be troubled by the length of the compressed codes used. @theraysmith Can you explain a little more about this?

https://groups.google.com/d/msgid/tesseract-ocr/1a3e8773-7151-48f9-92bb-fda888293eab%40googlegroups.com?utm_medium=email&utm_source=footer > While the single "S" is recognized correctly, the text "2S" is recognized as "25". Here is link to the test image: https://03054610326450256607.googlegroups.com/attach/b8b86693ac072/2s.png?part=0.4&view=1

accuracy
ambiguously

For Tesseract 5 what are the best practices regarding OPENMP. Is it still true: 1. OPENMP is **needed** for training so build tesseract and training tools with `--enable-openmp`. 2. For...

question
OpenMP

* See https://github.com/Shreeshrii/imagessan/tree/master/groundtruthimages for the images used for testing. san.traineddata in this repo (4.0 alpha version) gives the following accuracy for the above sample of images: Character/Word Error Rate |...