tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Getting lstmf file not foud error during OCRB fonts training

Open anupamja-kore opened this issue 2 years ago • 0 comments

$ ./tesstrain.sh --fonts_dir /home/anupamjain/Documents/workspace/ocr_training/fonts --fontlist 'OCRB' --lang eng --linedata_only --langdata_dir /home/anupamjain/Documents/workspace/ocr_training/langdata_lstm --tessdata_dir /home/anupamjain/Documents/workspace/ocr_training/tesseract/tessdata --save_box_tiff --maxpages 10 --output_dir /home/anupamjain/Documents/workspace/ocr_training/train --exposures "0"

=== Starting training for language 'eng' [Thursday 28 April 2022 02:46:14 PM IST] /usr/bin/text2image --fonts_dir=/home/anupamjain/Documents/workspace/ocr_training/fonts --ptsize 12 --font=OCRB --outputbase=/tmp/font_tmp.MNTaPRqYPg/sample_text.txt --text=/tmp/font_tmp.MNTaPRqYPg/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.MNTaPRqYPg Stripped 1 unrenderable words Rendered page 0 to file /tmp/font_tmp.MNTaPRqYPg/sample_text.txt.tif

=== Phase I: Generating training images === Rendering using OCRB [Thursday 28 April 2022 02:46:16 PM IST] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.MNTaPRqYPg --fonts_dir=/home/anupamjain/Documents/workspace/ocr_training/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2022-04-28.z0H/eng.OCRB.exp0 --max_pages=10 --font=OCRB --ptsize 12 --text=/home/anupamjain/Documents/workspace/ocr_training/langdata_lstm/eng/eng.training_text Stripped 574 unrenderable words Rendered page 0 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 577 unrenderable words Rendered page 1 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 580 unrenderable words Rendered page 2 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 588 unrenderable words Rendered page 3 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 580 unrenderable words Rendered page 4 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 590 unrenderable words Rendered page 5 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 583 unrenderable words Rendered page 6 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 578 unrenderable words Rendered page 7 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 586 unrenderable words Rendered page 8 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif Stripped 593 unrenderable words Rendered page 9 to file /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif

=== Phase UP: Generating unicharset and unichar properties files === [Thursday 28 April 2022 02:46:19 PM IST] /usr/bin/unicharset_extractor --output_unicharset /tmp/eng-2022-04-28.z0H/eng.unicharset --norm_mode 1 /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp10.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp11.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp12.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp13.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp1.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp3.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp4.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp5.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp6.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp7.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp8.box /tmp/eng-2022-04-28.z0H/eng.OCRB.exp9.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp10.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp11.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp12.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp13.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp1.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp3.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp4.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp5.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp6.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp7.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp8.box Failed to read data from: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp9.box Wrote unicharset file /tmp/eng-2022-04-28.z0H/eng.unicharset [Thursday 28 April 2022 02:46:19 PM IST] /usr/bin/set_unicharset_properties -U /tmp/eng-2022-04-28.z0H/eng.unicharset -O /tmp/eng-2022-04-28.z0H/eng.unicharset -X /tmp/eng-2022-04-28.z0H/eng.xheights --script_dir=/home/anupamjain/Documents/workspace/ocr_training/langdata_lstm Loaded unicharset of size 3 from file /tmp/eng-2022-04-28.z0H/eng.unicharset Setting unichar properties Setting script properties Writing unicharset to file /tmp/eng-2022-04-28.z0H/eng.unicharset

=== Phase E: Generating lstmf files === Using TESSDATA_PREFIX=/home/anupamjain/Documents/workspace/ocr_training/tesseract/tessdata --------img file--------- run_command tesseract /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0 --psm 6 lstm.train

[Thursday 28 April 2022 02:46:19 PM IST] /usr/bin/tesseract /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.tif /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.lstmf --psm 6 lstm.train Tesseract Open Source OCR Engine v4.1.1 with Leptonica Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10 ERROR: /tmp/eng-2022-04-28.z0H/eng.OCRB.exp0.lstmf does not exist or is not readable

anupamja-kore avatar Apr 28 '22 09:04 anupamja-kore