EasyOCR icon indicating copy to clipboard operation
EasyOCR copied to clipboard

Fine-tuned EasyOCR model with thai_g1.pth

Open kwankoravich opened this issue 3 years ago • 2 comments

I'm working on EasyOCR Model and I would like to fine-tune the model. I'm looking into the en_filtered_config.yaml However, I'm not sure if I would like to fine-tune Thai dataset, how to change 'lang_char' parameter.

The default is lang_char: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' I supposed that it should be lang_char: 'กขคฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลวศษสหฬอฮฯะ ัา ำ ิ ี ึ ื ุ ู ฺเแโใไๆ ็ ่ ้ ๊ ๋ ์ ํ๑๒๓๔๕๖๗๘๙'.

However, I got the error when I load the model. So, could you please suggest to me how to adjust en_filtered_config.yaml?

kwankoravich avatar Jun 24 '22 04:06 kwankoravich

Hi @kwankoravich ! The error probably occurs because the model expects the length of lang_char to be 52 while you are inputting a 93 characters string (Although it could be different, I need to see the error ). I suggest to fine tune the Thai dataset, you could always just go back in case something happen.

s39674 avatar Jun 28 '22 12:06 s39674

Hi @kwankoravich , From what I found in the source code, I use lang_char = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZกขคฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮฤเแโใไะาุูิีืึั่้๊๋็์ำํฺฯๆ and it's work for me in order to do the fine tuning.

SarmSKunatham avatar Jun 29 '22 09:06 SarmSKunatham

@SarmSKunatham how do you find that lang_char value inside the source code?

Edit: lang_char can be found at easyocr/config.py just find your language, and you can see the chars

darwinharianto avatar Mar 24 '23 02:03 darwinharianto

Dear @kwankoravich Did you find a solution to your problem? I am also doing fine-tunning on Korean language

Dear @darwinharianto Did you successfully fine-tune the model?

khawar-islam avatar Apr 11 '23 05:04 khawar-islam

yes, you just need to prepare the data and do it based on https://github.com/JaidedAI/EasyOCR/blob/master/trainer/trainer.ipynb

darwinharianto avatar Apr 11 '23 07:04 darwinharianto

@darwinharianto I am confused about the pre-trained weights link. Where I can give the link to the pre-trained weight and where i can download the prestrained weights for Korean, and English recognition? If you check the below link there are no pre-trained weights.

https://github.com/JaidedAI/EasyOCR/blob/master/trainer/config_files/en_filtered_config.yaml

khawar-islam avatar Apr 11 '23 07:04 khawar-islam

you just need to use the default one

import easyocr
easyocr.Reader(['kr']) # is it kr for korean?

This will automatically download your pretrained, which is saved at ~/.EasyOCR/model if you are using ubuntu. Then change that yaml file settings

saved_model: 'path to the downloaded pretrained(the one inside easyocr's model folder)'

darwinharianto avatar Apr 11 '23 07:04 darwinharianto

@darwinharianto yes, I know but this KR model is not robust for Korean handwritten recognition. Therefore, I want to fine-tune the KR model on KR handwritten samples (2M)

khawar-islam avatar Apr 11 '23 07:04 khawar-islam

I think I don't understand what are you trying to achieve. When I try to fine tune a model, I load the previous model, then run training using my custom dataset on the model. The resulting model would be my fine tuned model.

In your case, you don't want to use the previous model, but want to fine tune it?

darwinharianto avatar Apr 11 '23 07:04 darwinharianto

@darwinharianto just let me know where we can download and pass the link of the previous model to fine-tune on new data.

khawar-islam avatar Apr 12 '23 01:04 khawar-islam

just let me know where we can download and pass the link of the previous model to fine-tune on new data.

I don't know where to download it, because I let easyocr to download it from me. The downloaded model is inside ~/.EasyOCR/model

darwinharianto avatar Apr 12 '23 01:04 darwinharianto

Yes, I find the EasyOCR model but where I can give a model link for fine-tuning?

khawar-islam avatar Apr 12 '23 01:04 khawar-islam

https://github.com/JaidedAI/EasyOCR/issues/762#issuecomment-1502819379

As you can see from my previous comment,

Then change that yaml file settings

saved_model: 'path to the downloaded pretrained(the one inside easyocr's model folder)'

darwinharianto avatar Apr 12 '23 02:04 darwinharianto