Transformer-OCR Why the recognition accuracy different from paper?

I applied the pre-trained model on ICDAR15 datasets, but the results are different from the reported ones in the paper?

Mar 21 '20 21:03 zobeirraisi

Hi @zobeirraisi

I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

Mar 22 '20 00:03 Jyouhou

Hi @zobeirraisi

I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

Hi @Jyouhou This is my results for ICDAR15 dataset: Link

Mar 22 '20 00:03 zobeirraisi

Thanks @zobeirraisi So the actual accuracy is ~71% We can wait for responses from the authors

Mar 22 '20 00:03 Jyouhou

There are label nosie in IC15 test set, and I have relabeled.

Mar 22 '20 00:03 fengxinjie

Hi @zobeirraisi I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

Hi @Jyouhou This is my results for ICDAR15 dataset: Link

I checked my prediction result, I don't know why our result different. For example, word_26_00.png##Kappa##Kappa## word_27_00.png##CAUTION##CAUTION## word_50_00.png##l:HOU##:HOU## ... are all corrent in my prediction.

Mar 22 '20 01:03 fengxinjie

Hi @zobeirraisi I am also interested in this work. It'd be greatly appreciated if you can post the results on datasets that you have tried.

Hi @Jyouhou This is my results for ICDAR15 dataset: Link

I think you should crop the test image by coords.txt first, then predict.

Mar 22 '20 02:03 fengxinjie

@Jyouhou @zobeirraisi Hi, can you tell us more about your pretrained model

Mar 27 '20 10:03 li10141110

According to my guess, the performance of this implementation should be 85% on IIIT-5K.

Mar 31 '20 09:03 delveintodetail

@delveintodetail have you trained. the developer didnot reply clearly in the matter of training. whether he crops the icdar words, or what....

It is not because of the data preprocessing, the evaluation of this code is wrong.

Apr 01 '20 01:04 delveintodetail

@delveintodetail Is there wrong in the predict.py file?

Apr 01 '20 07:04 li10141110

I have been training this model on the ICDAR 2015 Word Recognition dataset (IC15) with no relabeling of the mislabeled data using the code provided.

In order to recognized all the characters in the datasets, the vocab used was: vocab = "<=,.+:;-!?$%#&*' ()@éÉ/\[]0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ>"+'"'+"´"+"΅"

If one keeps training and relies only on the loss on the test dataset, the model will overfit and I have obtained different models with 100% on the test dataset. This means that even the mislabeled data is reproduced exactly just as the human labeled it with the errors. (Note: the model is only trained on the training dataset! Never on the test dataset! Yet, the best models on the inference on the test datasets were saved as training progressed).

Typically, such models may have relatively poor performance on the training data itself: On testing data: Summary: # wrong: 0 # total: 2077 wrong 0.0% On training data: Summary: # wrong: 1959 # total: 4468 wrong 43.85%

Starting from scratch, training and saving only the models that improves both the inference performance on both the test data and the training data, then one can get results like this after 1533 epochs using batch_size = 64: on test data: Summary: #wrong: 11 #total: 2077 wrong 0.5% on training data: Summary: #wrong: 620 #total: 4468 wrong 13.9%

Inspection shows that some of these models give the same answer as the human on some of the mislabeled data, at least on the test dataset.

As training progresses and new models are saved, the inference performance particularly improves on the training dataset while more slowly improving on the testing dataset.

Thus this models seems an overkill on the ICDAR 2015 dataset and the mislabeling makes comparison difficult.

Update: The model continued training and these are the results: loss for test during training: 0.006546 loss for training data during training: 0.027809

inference on test data: Summary: #wrong: 0 #total: 2077 wrong 0.0% inference on training data: Summary: #wrong: 129 #total: 4468 wrong 2.887%

Other training and tests with synthetic images suggest that it does not generalize so well.

Apr 21 '20 19:04 gussmith

The results above were obtained with the code provided as is. Since then, I realized from my results and reading others that nevertheless, there is apparently an error in the code, which essentially trains the network when the validation is run. It is part of the initial code provided in the Annotated Transformer that the authors refer to. see issue testloss would lead to model update on eval mode #7 #7

Apr 23 '20 04:04 gussmith

Transformer-OCR Transformer-OCR copied to clipboard

Why the recognition accuracy different from paper?

Transformer-OCR
Transformer-OCR copied to clipboard