tess5train-fonts icon indicating copy to clipboard operation
tess5train-fonts copied to clipboard

ERROR: Failed to continue from: data/eng/eng.lstm

Open WissamAntoun opened this issue 2 years ago • 0 comments

Hey @Shreeshrii, I'm using your Makefile in a docker container to train tesseract 5 of an English font, just to see if my setup works.

I've been encountering this issue for a while now:

Loaded file data/eng/eng.lstm, unpacking...
Failed to continue from: data/eng/eng.lstm

I have tried to use traineddata from tessdata_best and tessdata , same exact error!!

this is the output of combine_tessdata -e data/eng.traineddata data/eng/eng.lstm with tessdata_best

Extracting tessdata components from data/eng.traineddata
Wrote data/eng/eng.lstm
Version:4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
17:lstm:size=11689099, offset=192
18:lstm-punc-dawg:size=4322, offset=11689291
19:lstm-word-dawg:size=3694794, offset=11693613
20:lstm-number-dawg:size=4738, offset=15388407
21:lstm-unicharset:size=6360, offset=15393145
22:lstm-recoder:size=1012, offset=15399505
23:version:size=80, offset=15400517

The command that fails is the following:

lstmtraining \
  --continue_from data/eng/eng.lstm --old_traineddata data//eng.traineddata \
  --traineddata data/engDejavu/engDejavu-proto.traineddata \
  --train_listfile data/engDejavu/list.train \
  --eval_listfile data/engDejavu/list.eval \
  --max_iterations 100 \
  --debug_interval -1 \
  --learning_rate 0.0001 \
  --target_error_rate 0.01 \
  --model_output data/engDejavu/checkpoints/engDejavu

Dockerfile:

# Set docker image
FROM ubuntu:18.04

# Skip the configuration part
ENV DEBIAN_FRONTEND noninteractive

# Update and install depedencies
RUN apt-get update && \
    apt-get install -y wget unzip bc vim python3-pip libleptonica-dev git htop

# Packages to complie Tesseract
RUN apt-get install -y --reinstall make && \
    apt-get install -y g++ autoconf automake libtool pkg-config libpng-dev libjpeg8-dev libtiff5-dev libicu-dev \
    libpango1.0-dev libcairo2-dev autoconf-archive rename ttf-mscorefonts-installer && fc-cache -f

# Set working directory
WORKDIR /app

RUN mkdir /app/src && cd /app/src

# # Set the locale
RUN apt-get install -y locales && locale-gen en_GB.UTF-8
ENV LC_ALL=en_GB.UTF-8
ENV LANG=en_GB.UTF-8
ENV LANGUAGE=en_GB.UTF-8

# # Copy requirements into the container at /app
COPY requirements.txt ./

RUN pip3 install -r requirements.txt

# # Complie Tesseract with training options (also feel free to update Tesseract versions and such!)
RUN mkdir src && cd /app/src && \
    git clone https://github.com/tesseract-ocr/tesseract.git && \
    cd /app/src/tesseract && \
    ./autogen.sh && ./configure --disable-graphics && make && make ins all && ldconfig && \
    make training && make training-install

Any help or guidance is appreciated! thanks

WissamAntoun avatar Jun 09 '22 22:06 WissamAntoun