tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

BCER eval displayed during lstmtraining and that from lstmeval are different

Open Shreeshrii opened this issue 2 years ago • 11 comments

While trying to plot the error rates for training, I have come across an anomaly.

I use the LOG file generated from messages output during lstmtraining run, which also out BCER eval on completion of evaluation using the eval list. This message displays the error rates as well as the learning iteration.

I separately run lstmeval on various checkpoint traineddata files to get the error rates.

I have found that BCER eval displayed during lstmtraining and that from lstmeval are different for the same `learning teration'.

Shreeshrii avatar Mar 03 '22 12:03 Shreeshrii

Below are plots from a recent training, trying to add superscripts to the English traineddata.

The first is the chart generated from the lstmtraining log file, plotting the BCER every 100 iterations, every checkpoint and every eval. Since BCER eval only reports the learning iteration value, the main x axis plots learning iterations.

engSuper-LOG-2

The second chart plots the CER values calculated for fast traineddata files for checkpoints with BCER less than 1% using lstmeval, ISRI OCR evaluation and OCRevaluation in addition to BCER from lstmtraining at every 100 iterations and every checkpoint. It uses training iterations as the main x axis and hence does not include the eval done during training (included in the chart above).

engSuper-2

Shreeshrii avatar Mar 03 '22 14:03 Shreeshrii

Here is the tsv file for eval during lstmtraining:

Name CheckpointCER LearningIteration TrainingIteration EvalCER IterationCER SubtrainerCER
197735 0.750366
199063 0.687183
200061 0.628007
201356 0.794562
202510 0.916657
204432 0.882159
210088 0.780537
211706 0.798433
212529 0.816408
238558 0.703124
240123 0.788989

Edited to a subset of tsv file.

Shreeshrii avatar Mar 03 '22 15:03 Shreeshrii

Here is the tsv file with lstmeval BCER for checkpoints with BCER less than 1%.

Name CheckpointCER LearningIteration TrainingIteration EvalCER IterationCER SubtrainerCER
145368 766900 1.017711
161436 921200 0.722449
161990 926800 0.744561
161997 926900 0.746143
170477 1012400 0.676673
201343 1356000 0.821650
201356 1356200 0.811972

Edited to a subset of tsv file.

Shreeshrii avatar Mar 03 '22 15:03 Shreeshrii

So, for the checkpoint with minimal BCER during training, the values are:

checkpoint training iterations = 1356200 checkpoint learning iterations = 201356 checkpoint BCER = 0.110

lowest eval learning iterations = 200061 lowest eval BCER = 0.628007

eval learning iterations = 201356 eval BCER = 0.794562

lstmeval training iterations = 1356200 lstmeval learning iterations = 201356 lstmeval BCER = 0.811972

As shown above, there is a difference in the lstmeval BCER (0.811972) compared to the BCER eval reported for the same number of learning iterations (0.794562). In my opinion, both should give the same result.

So, there seems to be some error in the reporting of learning iterations # for BCER eval. Maybe it would help, if both training iterations and learning iterations are reported for BCER eval done during lstmtraining.

Shreeshrii avatar Mar 03 '22 15:03 Shreeshrii

Do both lstmtraining and lstmeval evaluate exactly the same subset of images+ground truth and in the same order in each evaluation cycle?

amitdo avatar Mar 13 '22 14:03 amitdo

Both evaluate the same subset of images+ground truth, the ones listed in list.eval. training is done on list.train. lstmeval run on checkpoints uses list.eval and goes through it sequentially. I do not know how the order of files for eval is decided during lstmtraining.

I also do not know what is the learning iteration number reported by eval during lstmtraining, i.e. whether it is the current learning iteration number when the eval is being reported or whether it is the saved iteration number from when eval was started

Shreeshrii avatar Mar 13 '22 15:03 Shreeshrii

having the same issue, this might be related to https://github.com/tesseract-ocr/tesstrain/issues/110 ?

whisere avatar Aug 01 '22 06:08 whisere

I can confirm this is still an issue. In my case, the difference is much worse:

plot_log plot_cer
htrbin again plot_log fast htrbin again plot_cer fast

As you can see, the lstmeval BCER is close to 100%, while the lstmtraining BCER is around 11%.

If I replace fast models with best models for checkpoint extraction in the rules for make plot, then the difference becomes benign:

plot_log plot_cer
htrbin again plot_log htrbin again plot_cer

So for me the observation that convert_to_int is the culprit seems true.

Another example pointing in a similar direction (different Tesseract/Tesstrain installation, different data):

plot_log plot_cer
herrnhut-kurrent tess bin-1CNN-Lfx400 broken plot_log herrnhut-kurrent tess bin-1CNN-Lfx400 broken plot_cer

And again with best instead of fast extraction:

plot_log plot_cer
herrnhut-kurrent tess bin-1CNN-Lfx400 plot_log herrnhut-kurrent tess bin-1CNN-Lfx400 plot_cer

Note that if I apply the models directly with the tesseract CLI, I can reproduce the behaviour shown in the plots – results are gibberish with the fast models, but ok with the best models.

bertsky avatar Mar 30 '24 18:03 bertsky

The net spec for 'best' and 'fast' is not the same.

Any 'fast' model was converted to int from a float model (but not from 'best'). The float models that were the origin of the 'fast' models were never released publicly.

amitdo avatar Mar 31 '24 02:03 amitdo

https://github.com/tesseract-ocr/tessdoc/blob/441f1ea328421e/Data-Files-in-tessdata_best.md#version-string--40000alpha--network-specification-for-tessdata_best

https://github.com/tesseract-ocr/tessdoc/blob/441f1ea328421e/Data-Files-in-tessdata_fast.md

amitdo avatar Mar 31 '24 02:03 amitdo

@amitdo

The net spec for 'best' and 'fast' is not the same.

That's not true. The VGSL spec / net_mode is the same, only the extraction method differs.

Anyway, that's irrelevant here, since the problem appears independent of where the training started (pretrained models in the tessdata repos or from scratch). The relevant difference is the checkpoint extraction method.

BTW, I am not saying fast always behaves like this, it's still somewhat surprising. I guess it depends on the course taken during lstmtraining – perhaps subtrainer feedback or other events. But users should avoid the fast method for now to be on the safe side IMO.

@Shreeshrii have you by any chance noticed any particular event in the training log which we can use to track this down?

bertsky avatar Mar 31 '24 16:03 bertsky