tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

text2image Null box at index 0

Open YiWenFY opened this issue 5 years ago • 17 comments

root@ubuntu:/home/administrator/tesseract-master# /usr/local/bin/text2image --text=/home/administrator/langdata/chi_sim/chi_sim.training_text --fontconfig_tmpdir=/tmp/font_tmp.HKWX4LOUh0 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=52 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0 --max_pages=0 --font=FreeSerif Bold Stripped 211 unrenderable words Rendered page 0 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 222 unrenderable words Rendered page 1 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 218 unrenderable words Rendered page 2 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 202 unrenderable words Rendered page 3 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 211 unrenderable words Rendered page 4 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 224 unrenderable words Rendered page 5 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 214 unrenderable words Rendered page 6 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 203 unrenderable words Rendered page 7 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 203 unrenderable words Rendered page 8 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 206 unrenderable words Rendered page 9 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 200 unrenderable words Rendered page 10 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 213 unrenderable words Rendered page 11 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 218 unrenderable words Rendered page 12 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 223 unrenderable words Rendered page 13 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 198 unrenderable words Rendered page 14 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 220 unrenderable words Rendered page 15 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 230 unrenderable words Rendered page 16 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 225 unrenderable words Rendered page 17 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 214 unrenderable words Rendered page 18 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 234 unrenderable words Rendered page 19 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 219 unrenderable words Rendered page 20 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 199 unrenderable words Rendered page 21 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 203 unrenderable words Rendered page 22 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 215 unrenderable words Rendered page 23 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 216 unrenderable words Rendered page 24 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 216 unrenderable words Rendered page 25 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 211 unrenderable words Rendered page 26 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 228 unrenderable words Rendered page 27 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 214 unrenderable words Rendered page 28 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 210 unrenderable words Rendered page 29 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 228 unrenderable words Rendered page 30 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 215 unrenderable words Rendered page 31 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 222 unrenderable words Rendered page 32 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 209 unrenderable words Rendered page 33 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 214 unrenderable words Rendered page 34 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 204 unrenderable words Rendered page 35 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 236 unrenderable words Rendered page 36 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 224 unrenderable words Rendered page 37 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 196 unrenderable words Rendered page 38 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 222 unrenderable words Rendered page 39 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 209 unrenderable words Rendered page 40 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 218 unrenderable words Rendered page 41 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 225 unrenderable words Rendered page 42 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 204 unrenderable words Rendered page 43 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 208 unrenderable words Rendered page 44 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 232 unrenderable words Rendered page 45 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 212 unrenderable words Rendered page 46 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 215 unrenderable words Rendered page 47 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 200 unrenderable words Rendered page 48 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 201 unrenderable words Rendered page 49 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 200 unrenderable words Rendered page 50 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 220 unrenderable words Rendered page 51 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 39 unrenderable words Rendered page 52 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Null box at index 0 Error: Call PrepareToWrite before WriteTesseractBoxFile!!

YiWenFY avatar Sep 10 '18 04:09 YiWenFY

What's the problem? What should I do?

YiWenFY avatar Sep 10 '18 04:09 YiWenFY

I tested this on Debian and get lower counts for the unrenderable words, but not error.

What is the output from /usr/local/bin/text2image --version? Which version of Ubuntu do you use? And do you get the problem with the tesseract-ocr package from Ubuntu, too?

stweil avatar Sep 10 '18 05:09 stweil

@stweil text2mage -v
4.0.0-beta.3 It is ok for me to use eng. training_text, but this problem only occurs when I use chi_sim. training_text

YiWenFY avatar Sep 10 '18 05:09 YiWenFY

  1. Please use beta.4/latest code from Github.
  2. What is the version of your Freeserif font?

Rendered page 0 to file /tmp/chi_sim-2018-09-10.Dpf/chi_sim.FreeSerif_Bold.exp0.tif Stripped 222 unrenderable words

That means the font does not have support for Simplified Chinese, probably only rendering the English letters - check the generated tif files.

Try with a different font that you know supports Chi_sim.

Shreeshrii avatar Sep 10 '18 05:09 Shreeshrii

@Shreeshrii I see, thank you, I can use English, should not support the font

YiWenFY avatar Sep 10 '18 05:09 YiWenFY

I can replicate the issue:

ubuntu@tesseract-ocr:~/tesstutorial$    /usr/bin/time ~/tesseract/src/training/tesstrain.sh \
>   --fonts_dir ../.fonts \
>   --lang chi_sim \
>   --linedata_only \
>   --noextract_font_properties \
>   --langdata_dir ../langdata \
>   --tessdata_dir ../tessdata_best \
>   --fontlist "FreeSerif Bold" \
>   --training_text ../langdata/chi_sim/chi_sim.training_text \
>   --workspace_dir ~/tmp \
>   --output_dir ../tesstutorial/chi_sim_test

=== Starting training for language 'chi_sim'
[Tue Sep 11 05:54:59 UTC 2018] /home/ubuntu/tesseract/src/training/text2image --fonts_dir=../.fonts --font=FreeSerif Bold --outputbase=/home/ubuntu/tmp/fc-cache/sample_text.txt --text=/home/ubuntu/tmp/fc-cache/sample_text.txt --fontconfig_tmpdir=/home/ubuntu/tmp/fc-cache
Rendered page 0 to file /home/ubuntu/tmp/fc-cache/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using FreeSerif Bold
[Tue Sep 11 05:55:07 UTC 2018] /home/ubuntu/tesseract/src/training/text2image --fontconfig_tmpdir=/home/ubuntu/tmp/fc-cache --fonts_dir=../.fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0 --max_pages=0 --font=FreeSerif Bold --text=../langdata/chi_sim/chi_sim.training_text
Stripped 235 unrenderable words
Rendered page 0 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 252 unrenderable words
Rendered page 1 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 246 unrenderable words
Rendered page 2 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 218 unrenderable words
Rendered page 3 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 238 unrenderable words
Rendered page 4 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 247 unrenderable words
Rendered page 5 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 224 unrenderable words
Rendered page 6 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 231 unrenderable words
Rendered page 7 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 228 unrenderable words
Rendered page 8 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 230 unrenderable words
Rendered page 9 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 242 unrenderable words
Rendered page 10 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 246 unrenderable words
Rendered page 11 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 221 unrenderable words
Rendered page 12 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 244 unrenderable words
Rendered page 13 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 244 unrenderable words
Rendered page 14 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 256 unrenderable words
Rendered page 15 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 241 unrenderable words
Rendered page 16 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 253 unrenderable words
Rendered page 17 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 241 unrenderable words
Rendered page 18 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 220 unrenderable words
Rendered page 19 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 247 unrenderable words
Rendered page 20 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 238 unrenderable words
Rendered page 21 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 235 unrenderable words
Rendered page 22 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 244 unrenderable words
Rendered page 23 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 246 unrenderable words
Rendered page 24 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 243 unrenderable words
Rendered page 25 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 236 unrenderable words
Rendered page 26 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 262 unrenderable words
Rendered page 27 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 230 unrenderable words
Rendered page 28 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 247 unrenderable words
Rendered page 29 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 228 unrenderable words
Rendered page 30 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 246 unrenderable words
Rendered page 31 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 260 unrenderable words
Rendered page 32 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 249 unrenderable words
Rendered page 33 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 222 unrenderable words
Rendered page 34 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 242 unrenderable words
Rendered page 35 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 239 unrenderable words
Rendered page 36 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 236 unrenderable words
Rendered page 37 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 244 unrenderable words
Rendered page 38 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 236 unrenderable words
Rendered page 39 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 241 unrenderable words
Rendered page 40 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 241 unrenderable words
Rendered page 41 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 234 unrenderable words
Rendered page 42 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 218 unrenderable words
Rendered page 43 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 225 unrenderable words
Rendered page 44 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 233 unrenderable words
Rendered page 45 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Stripped 194 unrenderable words
Rendered page 46 to file /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.tif
Null box at index 0
Error: Call PrepareToWrite before WriteTesseractBoxFile!!

=== Phase UP: Generating unicharset and unichar properties files ===
[Tue Sep 11 05:55:41 UTC 2018] /home/ubuntu/tesseract/src/training/unicharset_extractor --output_unicharset /tmp/chi_sim-2018-09-11.T5E/chi_sim.unicharset --norm_mode 1 /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.box
Failed to read data from: /tmp/chi_sim-2018-09-11.T5E/chi_sim.FreeSerif_Bold.exp0.box
Wrote unicharset file /tmp/chi_sim-2018-09-11.T5E/chi_sim.unicharset

Shreeshrii avatar Sep 11 '18 05:09 Shreeshrii

chi_sim.FreeSerif_Bold.exp0.zip

Zip file with the generated tif file, has only English in it.

@stweil, I have seen this error in other cases also.

ubuntu@tesseract-ocr:~/tesstutorial$ which text2image /home/ubuntu/tesseract/src/training/text2image

ubuntu@tesseract-ocr:~/tesstutorial$ text2image -v 4.0.0-beta.4-138-g2093

Shreeshrii avatar Sep 11 '18 06:09 Shreeshrii

Why was this closed? Shouldn't it get the bug label? In any case, I don't think that it must be addressed before releasing 4.0.0.

stweil avatar Sep 17 '18 20:09 stweil

I have the same issue on tesseract 4.0.0-beta.4-163-ge124

Rendering using Rounded M+ 1p
[Tue Sep 25 00:05:56 WIB 2018] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.5zVUYVMoc5 --fonts_dir=/home/home/temp --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0 --max_pages=0 --font=Rounded M+ 1p --text=/home/home/langdata/jpn/jpn.training_text
Stripped 54 unrenderable words
Rendered page 0 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 38 unrenderable words
Rendered page 1 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 38 unrenderable words
Rendered page 2 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 61 unrenderable words
Rendered page 3 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 56 unrenderable words
Rendered page 4 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 47 unrenderable words
Rendered page 5 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 49 unrenderable words
Rendered page 6 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 48 unrenderable words
Rendered page 7 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 40 unrenderable words
Rendered page 8 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 47 unrenderable words
Rendered page 9 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 48 unrenderable words
Rendered page 10 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 49 unrenderable words
Rendered page 11 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 47 unrenderable words
Rendered page 12 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 47 unrenderable words
Rendered page 13 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 43 unrenderable words
Rendered page 14 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 42 unrenderable words
Rendered page 15 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 42 unrenderable words
Rendered page 16 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 43 unrenderable words
Rendered page 17 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 51 unrenderable words
Rendered page 18 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 31 unrenderable words
Rendered page 19 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 41 unrenderable words
Rendered page 20 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 53 unrenderable words
Rendered page 21 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 29 unrenderable words
Rendered page 22 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 32 unrenderable words
Rendered page 23 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 39 unrenderable words
Rendered page 24 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 41 unrenderable words
Rendered page 25 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 41 unrenderable words
Rendered page 26 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 47 unrenderable words
Rendered page 27 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 42 unrenderable words
Rendered page 28 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 36 unrenderable words
Rendered page 29 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 35 unrenderable words
Rendered page 30 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 36 unrenderable words
Rendered page 31 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 40 unrenderable words
Rendered page 32 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 43 unrenderable words
Rendered page 33 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 43 unrenderable words
Rendered page 34 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 50 unrenderable words
Rendered page 35 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 48 unrenderable words
Rendered page 36 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Stripped 1 unrenderable words
Rendered page 37 to file /tmp/jpn-2018-09-25.P8e/jpn.Rounded_M+_1p.exp0.tif
Null box at index 0
Error: Call PrepareToWrite before WriteTesseractBoxFile!!

zip file with tmp output. The .tif file has the correct japanese text, but the .box file is empty.

Any advice on how to troubleshoot the issue would be much appreciated!

anonynamja avatar Sep 24 '18 17:09 anonynamja

The tif file has 38 pages - 0 to 37. Please take a look at your training text and see if there is anything different in it at the point the tif file stops.

Shreeshrii avatar Sep 24 '18 18:09 Shreeshrii

Additionally, for LSTM training, use

--noextract_font_properties \

as part of tesstrain.sh command.

Shreeshrii avatar Sep 24 '18 18:09 Shreeshrii

Thanks! I made a mistake above and cloned the wrong langdata repo. Switched to the LSTM one and LSTM training is now running correctly, at least afaict.

anonynamja avatar Sep 25 '18 02:09 anonynamja

I have this type of issue, but i dint get the error, the traindata was created and asked me to run lstm, the box file i got does not segment the character in the it has to be. shreeshri help me out

Naojen avatar May 24 '19 06:05 Naojen

Related Issue https://github.com/tesseract-ocr/tesseract/issues/2654

Shreeshrii avatar Jan 05 '22 04:01 Shreeshrii

I found for all cases I worked, that if the number of spaces before the first non-whitespace character in the text file is more than 1 space, then this error occurs. Otherwise if the first non-whitespace character has 1 or less whitespace before it, the error does not occur for me.

HeinrichWizardKreuser avatar Apr 10 '22 13:04 HeinrichWizardKreuser

The same it works for english but not for hindi

Ganesh9100 avatar Apr 03 '23 13:04 Ganesh9100

This problem is bugging me both on the Mac and Ubuntu. I am pretty sure the problem is not with the text. It seems to have something with memory or something like that.

It has rhythm. It processes correctly for a couple of lines: then --null box error for two lines, then normal process for a couple of lines, then null box for two lines etc. I really don't know what to do.

DesBw avatar Sep 17 '23 08:09 DesBw