Shreeshrii
Shreeshrii
The pdf output is not correct for Devanagari script when using the 3.2.3 experimental version for tesseract 4.0.0alpha. Please see attached zip file with input image, text, hocr and pdf...
@stweil You had mentioned at one point that it should be possible to finetune `fast` models. It will be useful to have this feature as many `fast` models use a...
While trying to plot the error rates for training, I have come across an anomaly. I use the LOG file generated from messages output during lstmtraining run, which also out...
``` tesseract -v tesseract 5.0.0-18-g771c1 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found NEON...
In Devanagari script, a virama is used to kill the inherent vowel of a consonant. When followed by another consonant, it forms a conjunct form. Depending on the font used,...
``` $ training/lstmtraining --model_output ~/tesstutorial/sanskrit2003_from_full/sanskrit2003 \ > --continue_from ~/tesstutorial/sanskrit2003_from_full/san.lstm \ > --train_listfile ~/tesstutorial/santrain/san.training_files.txt \ > --target_error_rate 0.01 Loaded file /home/shree/tesstutorial/sanskrit2003_from_full/sanskrit2003_checkpoint, unpacking... Successfully restored trainer from /home/shree/tesstutorial/sanskrit2003_from_full/sanskrit2003_checkpoint Loaded 1746/1746 pages (0-1746)...
https://github.com/tesseract-ocr/tesseract/issues/648#issuecomment-271987456 >Indic may be troubled by the length of the compressed codes used. @theraysmith Can you explain a little more about this?
https://groups.google.com/d/msgid/tesseract-ocr/1a3e8773-7151-48f9-92bb-fda888293eab%40googlegroups.com?utm_medium=email&utm_source=footer > While the single "S" is recognized correctly, the text "2S" is recognized as "25". Here is link to the test image: https://03054610326450256607.googlegroups.com/attach/b8b86693ac072/2s.png?part=0.4&view=1
For Tesseract 5 what are the best practices regarding OPENMP. Is it still true: 1. OPENMP is **needed** for training so build tesseract and training tools with `--enable-openmp`. 2. For...
* See https://github.com/Shreeshrii/imagessan/tree/master/groundtruthimages for the images used for testing. san.traineddata in this repo (4.0 alpha version) gives the following accuracy for the above sample of images: Character/Word Error Rate |...