tess5train-fonts
tess5train-fonts copied to clipboard
ERROR: Failed to continue from: data/eng/eng.lstm
Hey @Shreeshrii, I'm using your Makefile in a docker container to train tesseract 5 of an English font, just to see if my setup works.
I've been encountering this issue for a while now:
Loaded file data/eng/eng.lstm, unpacking...
Failed to continue from: data/eng/eng.lstm
I have tried to use traineddata from tessdata_best
and tessdata
, same exact error!!
this is the output of combine_tessdata -e data/eng.traineddata data/eng/eng.lstm
with tessdata_best
Extracting tessdata components from data/eng.traineddata
Wrote data/eng/eng.lstm
Version:4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
17:lstm:size=11689099, offset=192
18:lstm-punc-dawg:size=4322, offset=11689291
19:lstm-word-dawg:size=3694794, offset=11693613
20:lstm-number-dawg:size=4738, offset=15388407
21:lstm-unicharset:size=6360, offset=15393145
22:lstm-recoder:size=1012, offset=15399505
23:version:size=80, offset=15400517
The command that fails is the following:
lstmtraining \
--continue_from data/eng/eng.lstm --old_traineddata data//eng.traineddata \
--traineddata data/engDejavu/engDejavu-proto.traineddata \
--train_listfile data/engDejavu/list.train \
--eval_listfile data/engDejavu/list.eval \
--max_iterations 100 \
--debug_interval -1 \
--learning_rate 0.0001 \
--target_error_rate 0.01 \
--model_output data/engDejavu/checkpoints/engDejavu
Dockerfile:
# Set docker image
FROM ubuntu:18.04
# Skip the configuration part
ENV DEBIAN_FRONTEND noninteractive
# Update and install depedencies
RUN apt-get update && \
apt-get install -y wget unzip bc vim python3-pip libleptonica-dev git htop
# Packages to complie Tesseract
RUN apt-get install -y --reinstall make && \
apt-get install -y g++ autoconf automake libtool pkg-config libpng-dev libjpeg8-dev libtiff5-dev libicu-dev \
libpango1.0-dev libcairo2-dev autoconf-archive rename ttf-mscorefonts-installer && fc-cache -f
# Set working directory
WORKDIR /app
RUN mkdir /app/src && cd /app/src
# # Set the locale
RUN apt-get install -y locales && locale-gen en_GB.UTF-8
ENV LC_ALL=en_GB.UTF-8
ENV LANG=en_GB.UTF-8
ENV LANGUAGE=en_GB.UTF-8
# # Copy requirements into the container at /app
COPY requirements.txt ./
RUN pip3 install -r requirements.txt
# # Complie Tesseract with training options (also feel free to update Tesseract versions and such!)
RUN mkdir src && cd /app/src && \
git clone https://github.com/tesseract-ocr/tesseract.git && \
cd /app/src/tesseract && \
./autogen.sh && ./configure --disable-graphics && make && make ins all && ldconfig && \
make training && make training-install
Any help or guidance is appreciated! thanks