tesstrain icon indicating copy to clipboard operation
tesstrain copied to clipboard

Segfault in lstmtraining when training the demo data

Open inductiveload opened this issue 3 years ago • 4 comments

Arch Linux,

tesseract 5.0.0-alpha-20210401-158-ge1761
 leptonica-1.81.0
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.0) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.5
 Found libcurl/7.77.0 OpenSSL/1.1.1k zlib/1.2.11 zstd/1.5.0 libidn2/2.3.1 libpsl/0.21.1 (+libidn2/2.3.0) libssh2/1.9.0 nghttp2/1.43.0
  • tesstrain 0e8151472ca034ee3366682d6829802ee1d9455e

What I did:

  • Cloned tessdata_best to ~/src
  • unzip ocrd-testset.zip -d data/ocrd-ground-truth
  • make training MODEL_NAME=ocrd START_MODEL=frk TESSDATA=~/src/tessdata_best MAX_ITERATIONS=10000

Output:

lstmtraining \
  --debug_interval 0 \
  --traineddata data/ocrd/ocrd.traineddata \
  --old_traineddata /home/john/src/tessdata_best/frk.traineddata \
  --continue_from data/frk/ocrd.lstm \
  --learning_rate 0.0001 \
  --model_output data/ocrd/checkpoints/ocrd \
  --train_listfile data/ocrd/list.train \
  --eval_listfile data/ocrd/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Loaded file data/frk/ocrd.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 101!
Num (Extended) outputs,weights in Series:
  1,48,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  TxyLfys64:64, 20736
  Lfx96:96, 61824
  RxLrx96:96, 74112
  Lfx384:384, 738816
  Fc101:101, 0
Total weights = 895648
Previous null char=98 mapped to 100
Continuing from data/frk/ocrd.lstm
make: *** [Makefile:278: data/ocrd/checkpoints/ocrd_checkpoint] Segmentation fault (core dumped

GDB of crashed lstmtraining:

0x00007ffff7eaa8c9 in tesseract::NetworkIO::Transpose(tesseract::TransposedArray*) const () from /usr/lib/libtesseract.so.5
(gdb) bt
#0  0x00007ffff7eaa8c9 in tesseract::NetworkIO::Transpose(tesseract::TransposedArray*) const () from /usr/lib/libtesseract.so.5
#1  0x00007ffff7ea0a36 in tesseract::LSTM::Backward(bool, tesseract::NetworkIO const&, tesseract::NetworkScratch*, tesseract::NetworkIO*) () from /usr/lib/libtesseract.so.5
#2  0x00007ffff7ebcb8f in tesseract::Series::Backward(bool, tesseract::NetworkIO const&, tesseract::NetworkScratch*, tesseract::NetworkIO*) () from /usr/lib/libtesseract.so.5
#3  0x000055555556f388 in ?? ()
#4  0x0000555555560f87 in ?? ()
#5  0x00007ffff7429b25 in __libc_start_main () from /usr/lib/libc.so.6
#6  0x00005555555619fe in ?? ()

inductiveload avatar Jul 21 '21 09:07 inductiveload

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 21 '21 03:08 stale[bot]

Cannot repoduce the problem. Could you please make a test without a start model? I.e. train from scratch?

wrznr avatar Aug 27 '21 16:08 wrznr

Hi there, my problem is quite similar. The execution without start_model works, but when adding a start model I get a segmentation fault:

lstmtraining \
  --debug_interval 0 \
  --traineddata data/pdf/pdf.traineddata \
  --old_traineddata /usr/share/tesseract-ocr/4.00/tessdata//eng.traineddata \
  --continue_from data/eng/pdf.lstm \
  --learning_rate 0.0001 \
  --model_output data/pdf/checkpoints/pdf \
  --train_listfile data/pdf/list.train \
  --eval_listfile data/pdf/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Loaded file data/eng/pdf.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 111!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys48:48, 12480
  Lfx96:96, 55680
  Lrx96:96, 74112
  Lfx192:192, 221952
  Fc111:111, 0
Total weights = 364384
Previous null char=110 mapped to 110
Continuing from data/eng/pdf.lstm
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00efa1bb61fb5e2acbac526cae15db47_22.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00ed3d1c5efa45cb1f159b2aea364c06_13.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00e9abbf6ae0316b26564489043309e7_28.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d93774feb260161c699826659335eb_26.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d93774feb260161c699826659335eb_31.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00db3bce204043f8ae6093acb10f3421_15.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d93774feb260161c699826659335eb_24.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00cf516d14934c8cc4aced3892e8023d_9.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d9bc8920fad718d800d8e03e5db4a1_26.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/0a0a3b164fb469e52d9532de17a0ca6d_15.lstmf
make: *** [Makefile:278: data/pdf/checkpoints/pdf_checkpoint] Segmentation fault

I use:

tesseract 4.1.1:
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

installed on Ubuntu 20.04.3 with apt install tesseract-ocr tesseract-ocr-eng

and the command: make training MODEL_NAME='pdf' START_MODEL='eng' CORES=8 PSM=6 TESSDATA='/usr/share/tesseract-ocr/4.00/tessdata/'

Same Problem with the test-set 'foo':

lstmtraining \
  --debug_interval 0 \
  --traineddata data/foo/foo.traineddata \
  --old_traineddata /usr/share/tesseract-ocr/4.00/tessdata//eng.traineddata \
  --continue_from data/eng/foo.lstm \
  --learning_rate 0.0001 \
  --model_output data/foo/checkpoints/foo \
  --train_listfile data/foo/list.train \
  --eval_listfile data/foo/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Loaded file data/eng/foo.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 119!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys48:48, 12480
  Lfx96:96, 55680
  Lrx96:96, 74112
  Lfx192:192, 221952
  Fc119:119, 0
Total weights = 364384
Previous null char=110 mapped to 118
Continuing from data/eng/foo.lstm
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/frapan_bittersuess_1891_0103_007.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/clauren_liebe_1827_0105_016.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/lenau_gedichte_1832_0225_006.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/hoffmann_elixiere01_1815_0173_012.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/andreas_fenitschka_1898_0066_007.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/poersch_gewerkschaftsbewegung_1897_0032_045.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/saar_novellen_1877_0283_020.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/raschdorff_hochbau_1880_0025_016.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/gutzkow_wally_1835_0154_008.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/fiedler_kuenstlerische_1887_0135_015.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/poersch_gewerkschaftsbewegung_1897_0020_021.lstmf
make: *** [Makefile:278: data/foo/checkpoints/foo_checkpoint] Segmentation fault

Command: make training START_MODEL='eng' CORE=8 TESSDATA='/usr/share/tesseract-ocr/4.00/tessdata/'

Codethrill-20 avatar Sep 20 '21 01:09 Codethrill-20

I had the same problem when trying to train with the system-provided start model. After reading https://github.com/tesseract-ocr/tesseract/issues/1573, I downloaded the corresponding tessdata_best model and everything worked fine.

stefan6419846 avatar May 20 '22 09:05 stefan6419846