EasyOCR Config file for fine-tune arabic model

Hi, I want to fine tune arabic model on my dataset what should be config file for this? i tried this before with

Transformation: 'None' FeatureExtraction: 'ResNet' SequenceModeling: 'BiLSTM' Prediction: 'CTC' num_fiducial: 20 input_channel: 1 output_channel: 512 hidden_size: 512 decode: 'greedy' new_prediction: False freeze_FeatureFxtraction: False freeze_SequenceModeling: False

Jul 12 '22 10:07 uniquefan

Is there someone who has experienced to fine-tune arabic model?

Jul 19 '22 07:07 uniquefan

if you get any good results than the model of this repo please let me know

Jul 19 '22 14:07 Mahmuod1

same here, also interested

Nov 10 '22 07:11 FarisHijazi

interested

Nov 25 '22 06:11 MohieEldinMuhammad

I noticed that there are 2 types of models, generation 1 and generation 2

the Arabic model falls under generation 1 which has the following config:

base_model: '../trainer/saved_models/arabic.pth'
FeatureExtraction: 'ResNet'
input_channel: 1
output_channel: 512
hidden_size: 512

so maybe change the title of this issue to Config file for fine-tune generation 1 model

Nov 25 '22 07:11 FarisHijazi

I'm still figuring out the right way to do things, but I noticed that the validation loader actually gets bad results, but when I do normal inference using reader, I get good results, so instead of wasting time training, we should spend time making sure the validation accuracy matches that of the inference from reader

Nov 25 '22 07:11 FarisHijazi

also to finetune, I noticed you have to make sure you have the correct vocabulary (characters, symbol, number, lang_char)

so for arabic generation 1

character: "0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\u0660\
  \u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669\xAB\xBB\u061F\u060C\u061B\
  \u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0627\u064B\u0628\u0629\u062A\u062B\u062C\
  \u062D\u062E\u062F\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063A\
  \u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064A\u064B\u064C\u064D\u064E\
  \u064F\u0650\u0651\u0652\u0653\u0654\u0670\u0671\u0679\u067E\u0686\u0688\u0691\u0698\
  \u06A9\u06AD\u06AF\u06BA\u06BE\u06C0\u06C1\u06C2\u06C3\u06C6\u06C7\u06C8\u06CB\u06CC\
  \u06D0\u06D2\u06D3\u06D5"
base_model: '../trainer/saved_models/arabic.pth'
FeatureExtraction: 'ResNet'
input_channel: 1
output_channel: 512
hidden_size: 512

I took this from easyocr\config.py, one catch though, is that this key character is never used, it actually uses symbol number and lang_char

I don't know how to split my long string into these, so instead I changed the code to just use character like so:

change the following lines in trainer/trainer.ipynb from this


    else:
        opt.character = opt.number + opt.symbol + opt.lang_char

to this

    elif 'character' not in opt:
            opt.character = opt.number + opt.symbol + opt.lang_char

now this will use opt.character and your model should train. make sure that when loading the model there are no warnings that some model weights were not loaded

Nov 25 '22 23:11 FarisHijazi

@FarisHijazi After training when i use the trained model instead of the original one i got bad results not the predictions i got while training, even on the images the model trained on and was getting 100% accuracy. did you figure out how to get the same predictions like in the training process ?

Nov 26 '22 04:11 MohieEldinMuhammad

I didn't get to that point where i go from training to prediction yes

However makw sure that there are no waenings when loading the model

Also put your test data in the validation loader and just see the accuracy it gets, if it's also wrong like in prediction, then the model is rhe problem not the code

Nov 26 '22 06:11 FarisHijazi

@FarisHijazi will try the validation loader, I think you need to overfit on say 2 images --> get 100% accuracy --> then test the model like in production as if the bug I'm mentioning is not solved then no point from the training

Nov 26 '22 06:11 MohieEldinMuhammad

Once again, really make sure that the code runs the model in the same way in deployment as in testing, disable detector, run on exactly the same images, etc

Nov 26 '22 06:11 FarisHijazi

@FarisHijazi After training when i use the trained model instead of the original one i got bad results not the predictions i got while training, even on the images the model trained on and was getting 100% accuracy. did you figure out how to get the same predictions like in the training process ?

Hello, I also encountered this problem and when I trained my model with a personal dataset and used its PTH file for testing, I could not get a good result even for the training images. Have you solved this problem? @MohieEldinMuhammad

Dec 28 '22 09:12 ftmasadi

EasyOCR EasyOCR copied to clipboard

Config file for fine-tune arabic model

EasyOCR
EasyOCR copied to clipboard