Shreeshrii issues

Results 75 issues of


                                            Shreeshrii

Investigate complex scripts in PoDoFo PDF export

The pdf output is not correct for Devanagari script when using the 3.2.3 experimental version for tesseract 4.0.0alpha. Please see attached zip file with input image, text, hocr and pdf...

Feature Request: Dequantization - convertion of int model to float model

@stweil You had mentioned at one point that it should be possible to finetune `fast` models. It will be useful to have this feature as many `fast` models use a...

feature request

training

priority: high

BCER eval displayed during lstmtraining and that from lstmeval are different

While trying to plot the error rates for training, I have come across an anomaly. I use the LOG file generated from messages output during lstmtraining run, which also out...

training

tesseract only processing first page of each tif when given text file with list of multipage tifs as input

``` tesseract -v tesseract 5.0.0-18-g771c1 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found NEON...

bug

LSTM: Training - explicit viraama not recognized correctly

In Devanagari script, a virama is used to kill the inherent vowel of a consonant. When followed by another consonant, it forms a conjunct form. Depending on the font used,...

training

traineddata

encoding failed

LSTM: Training - Error msg - Encoding of string failed!

``` $ training/lstmtraining --model_output ~/tesstutorial/sanskrit2003_from_full/sanskrit2003 \ > --continue_from ~/tesstutorial/sanskrit2003_from_full/san.lstm \ > --train_listfile ~/tesstutorial/santrain/san.training_files.txt \ > --target_error_rate 0.01 Loaded file /home/shree/tesstutorial/sanskrit2003_from_full/sanskrit2003_checkpoint, unpacking... Successfully restored trainer from /home/shree/tesstutorial/sanskrit2003_from_full/sanskrit2003_checkpoint Loaded 1746/1746 pages (0-1746)...

bug

training

encoding failed

Q&A: Indic - length of the compressed codes

https://github.com/tesseract-ocr/tesseract/issues/648#issuecomment-271987456 >Indic may be troubled by the length of the compressed codes used. @theraysmith Can you explain a little more about this?

LSTM: Non-dictionary words with combination of letters and numbers not recognized.

https://groups.google.com/d/msgid/tesseract-ocr/1a3e8773-7151-48f9-92bb-fda888293eab%40googlegroups.com?utm_medium=email&utm_source=footer > While the single "S" is recognized correctly, the text "2S" is recognized as "25". Here is link to the test image: https://03054610326450256607.googlegroups.com/attach/b8b86693ac072/2s.png?part=0.4&view=1

accuracy

ambiguously

RFC: Best Practices re OPENMP - for training, evaluation and recognition

For Tesseract 5 what are the best practices regarding OPENMP. Is it still true: 1. OPENMP is **needed** for training so build tesseract and training tools with `--enable-openmp`. 2. For...

question

OpenMP

Vedic Sanskrit Traineddata for 4.0

* See https://github.com/Shreeshrii/imagessan/tree/master/groundtruthimages for the images used for testing. san.traineddata in this repo (4.0 alpha version) gives the following accuracy for the above sample of images: Character/Word Error Rate |...