EasyOCR icon indicating copy to clipboard operation
EasyOCR copied to clipboard

Poor performance after training on custom data ℹ️

Open amir2628 opened this issue 1 year ago • 7 comments

Greetings! 👋🏼

I used your repository (and not the deep-text-recognition-benchmark), so I think it would be better to ask this in here.

I hope the issue would not be lost in the void😢

  • Can you please give some insights to how is that the accuracy can be as good as >90% with really low validation loss like <0.01, but when using the trained model in production ( easyocr.Reader ) the extracted text is just nonsense and not even close to the actual text in the image? :confused:

  • I saw some comments to other issues like this that perhaps using a dataset close to your domain would help, which I used similar images for both training, validation, and inference, but still no changes. ❌ : 🙅🏼‍♂️

  • Moreover, if you just train one model for let's say 30000 iterations, get the best_accuracy.pth and train it again for another 30000 iterations, would it ultimately make the model better? :suspect:

  • In conclusion, I would like to know all of your opinions (specially from the contributors of this repository since they know better what they developed) on why the performance in inference is worse than what it shows in the training process. 🤝

  • If it helps to provide you with anything, let me know. 🗒️

  • Also be noted that before giving any image to the model upon inference, I do image processing to make sure that the image is more readable for the model. 😏

Have a good one!

amir2628 avatar May 09 '23 21:05 amir2628

I assume u r training just the recognition model, not detection model. Just some points from my side( I think number 2 maybe the most pertinent)

  • Please reconfirm whether indeed the training, validation and real life application images are indeed the same, including the preprocessing you do on those
  • Regarding the gibberish/ nonsense output from the model during inference, I suspect that there maybe a code issue and that the model either doesnt load weights OR reinitiqalizes them to random after loading. A way to confirm that , without looking into code, is to run inference code ,same you do for real life application images, on the train/val images again individually and see what the accuracy/output is . I think this may most likely be your case
  • If you train for 30000 itr, then retrain with best weights again for 30000 itrs , it wont necessarily mean you end up with better weights

MdotO avatar May 10 '23 03:05 MdotO

I assume u r training just the recognition model, not detection model. Just some points from my side( I think number 2 maybe the most pertinent)

* Please reconfirm whether indeed the training, validation and real life application images are indeed the same, including the **preprocessing** you do on those

* Regarding the gibberish/ nonsense output from the model during inference, I suspect that there maybe a code issue and that the model either doesnt load weights OR reinitiqalizes them to random after loading. A way to confirm that , without looking into code, is to **run inference code** ,same you do for real life application images, on the train/val images again individually and see what the accuracy/output is . I think this may most likely be your case

* If you train for 30000 itr, then retrain with best weights again for 30000 itrs , it wont necessarily mean you end up with better weights

Thank you very much for taking the time to comment on the issue. 🤝

  • Yes! I can confirm that they are the same.

  • So I went and checked the code for the point you mentioned about the weights:

turns out there is the following part in the recognition.py of EasyOCR:

    if device == 'cpu':
        state_dict = torch.load(model_path, map_location=device)
        new_state_dict = OrderedDict()
        for key, value in state_dict.items():
            new_key = key[7:]
            new_state_dict[new_key] = value
        model.load_state_dict(new_state_dict)

Which I am also on CPU. So I wen and compared the dicts and it seems that they are the same:

        # Compare state_dict and new_state_dict
        differing_indices = []
        for idx, (key, state_value) in enumerate(state_dict.items()):
            new_value = new_state_dict.get(key, None)
            if new_value is not None and not torch.equal(state_value, new_value):
                differing_indices.append(idx)

        if len(differing_indices) == 0:
            print("The state_dict and new_state_dict are the same.")
        else:
            print("The state_dict and new_state_dict differ at the following indices:")
            print(differing_indices)

Output: The state_dict and new_state_dict are the same.

So I'm not sure what can cause this... :suspect:

Any ideas? 💭 Have you tried to use your own custom model on custom dataset (preferably non English text) ? did it work ok?

@rkcosmos can you please take a look ?

amir2628 avatar May 10 '23 13:05 amir2628

Yes it worked fine for me on my custom.non English set- Thai dataset

Like I mentioned above, I believe a better way, without looking at code( as the issue with code may not be just at the loading model weights part but rather sometimes after) is to see if the same inference code works well on on the same images you trained on . don't use the training code here for train images but rather inference code. If the model produces nonsense words then it most likely means that the model weights have changed (because the training code suggests high accuracy but inference script doesn't support even though they are the same images)

If you want to probe in code directly then maybe compare the weights at start , during load and just before actual inference (i.e model.predict native pytorch method implementation)

MdotO avatar May 10 '23 16:05 MdotO

Hello, I am in the same problem, I have trained a model with a dataset containing 37500 images, and after training( for 30000 itr), I got 3 .pth files 'custom_example.pth', 'best_accuracy_pth', and 'best_norm_ED.pth'. My images are only images with some specific characters, like this: screenshot_2 screenshot_0 screenshot_1 and I did the same thing as this, with every pth file I got : https://github.com/JaidedAI/EasyOCR/blob/master/custom_model.md#how-to-use-your-custom-model BUT the results were nonsense

I would appreciate it if you could give me some advice.

IAchraf avatar May 23 '23 02:05 IAchraf

I think a typical issue that happens when training on custom data, is that the dataset is not representative enough, meaning the model gets overfitted to your specific dataset. There are some ways this can be solved, for example fine-tuning less (lower learning rate/using fewer training iterations), or have a more diverse custom dataset. I made a story on creating a synthetic dataset on TowardsAI here: https://pub.towardsai.net/how-to-make-a-synthesized-dataset-to-fine-tune-your-ocr-3573f1a7e08b , which could potentially be part of a solution to your problem. Additionally, if your model is giving complete nonsense responses, I would guess it is an issue in the code. I hope this helps:)

EivindKjosbakken avatar Jan 25 '24 18:01 EivindKjosbakken

has anyone been able to resolve this issue?

KAKkkdkkakkda avatar Jul 09 '24 22:07 KAKkkdkkakkda

I actually think I found the cause. You need to place characters and symbols in your custom_example.yaml in the same order as in your training.yaml config you used for fine tuning. It should look like this: numbers, symbols, letters (thats the original order of characters inside your training config, since you add a new language to custom_example.yaml you mess up the order)

KAKkkdkkakkda avatar Jul 10 '24 17:07 KAKkkdkkakkda