EasyOCR
EasyOCR copied to clipboard
Information about format of CRAFT training dataset
I would like to train the detection model further beyond those provided to see if I can improve results on a specific dataset.
I would like to include images from some of the dataset I have in the training process.
The format of the labels in the training dataset for the craft detection model is:
bbox in format x,y,x,y,x,y,x,y followed by the text in the box.
I have a few questions hopefully someone can answer:
- Is it necessary to have the text - as I am not training the recognition model I don't suppose this is being used?
- should the data be segmented by word, or is it OK to have a bounding box around a full sentence?
- What is the meaning of the '###' 'dont care' tag ?
Thanks!
For 1. I do not think it is necessary to have a string, as you correctly state, the text is note being used. A text is however expected in the format, so you should at least include an empty string there. For 2, assuming you are using a pretrained model (and not training one from scratch), the model is made to segment by word, but if you have enough data you could potentially train it to box a full sentence, though it might be better training a model from scratch then. If you are interested, I wrote an article for fine-tuning the CRAFT model on TowardsAI https://medium.com/towards-artificial-intelligence/how-to-fine-tune-the-craft-model-in-easyocr-f9fa0ac5cc9d, which could hopefully be of help. I unfortunately do not know the answer to 3.
Thanks @EivindKjosbakken - I'll take a look at your article.
Ah, paywall :*(
@EivindKjosbakken I'm trying to train the CRAFT model but got less accuracy {'precision': 0, 'recall': 0.0, 'hmean': 0} on validation data while train loss is around 0.15 here is the config info : I'm using my custom data to finetune it, while the training loss seems to be around 0.15 or nearby but the validation data has the {'precision': 0, 'recall': 0.0, 'hmean': 0} I thought that my data has some wrong annotations so tried the ICDAR dataset still has the same results {'precision': 0, 'recall': 0.0, 'hmean': 0}. can you help me what to do used this command :python3 train.py --yaml=custom_data_train the yaml file content : wandb_opt: False
results_dir: "./exp/" vis_test_dir: "./vis_result/"
data_root_dir: "E:/train lic plate/CRAFT_DATA" score_gt_dir: None # "/data/ICDAR2015_official_supervision" mode: "weak_supervision"
train: backbone : vgg use_synthtext: False # If you want to combine SynthText in train time as CRAFT did, you can turn on this option synth_data_dir: "/data/SynthText/" synth_ratio: 5 real_dataset: custom ckpt_path: "./pretrained_model/CRAFT_clr_amp_29500.pth" eval_interval: 1000 batch_size: 5 st_iter: 0 end_iter: 25000 lr: 0.0001 lr_decay: 7500 gamma: 0.2 weight_decay: 0.00001 num_workers: 0 # On single gpu, train.py execution only works when num worker = 0 / On multi-gpu, you can set num_worker > 0 to speed up amp: True loss: 2 neg_rto: 0.3 n_min_neg: 5000 data: vis_opt: False pseudo_vis_opt: False output_size: 768 do_not_care_label: ['###', ''] mean: [0.485, 0.456, 0.406] variance: [0.229, 0.224, 0.225] enlarge_region : [0.5, 0.5] # x axis, y axis enlarge_affinity: [0.5, 0.5] gauss_init_size: 200 gauss_sigma: 40 watershed: version: "skimage" sure_fg_th: 0.75 sure_bg_th: 0.05 syn_sample: -1 custom_sample: -1 syn_aug: random_scale: range: [1.0, 1.5, 2.0] option: False random_rotate: max_angle: 20 option: False random_crop: version: "random_resize_crop_synth" option: True random_horizontal_flip: option: False random_colorjitter: brightness: 0.2 contrast: 0.2 saturation: 0.2 hue: 0.2 option: True custom_aug: random_scale: range: [ 1.0, 1.5, 2.0 ] option: False random_rotate: max_angle: 20 option: True random_crop: version: "random_resize_crop" scale: [0.03, 0.4] ratio: [0.75, 1.33] rnd_threshold: 1.0 option: True random_horizontal_flip: option: True random_colorjitter: brightness: 0.2 contrast: 0.2 saturation: 0.2 hue: 0.2 option: True
test: trained_model : null custom_data: test_set_size: 500 test_data_dir: "E:/train lic plate/CRAFT_DATA" text_threshold: 0.75 low_text: 0.5 link_threshold: 0.2 canvas_size: 2240 mag_ratio: 1.75 poly: False cuda: True vis_opt: False please help me where and what I'm doing wrong.
Sorry for the late reply; I have been busy with my thesis and have been away over the summer. It is difficult to debug your problem from this information, so instead I will give you some pointers to where the issue might be. First, I would test out the model you are trying to fine-tune (/pretrained_model/CRAFT_clr_amp_29500.pth) and see if it performs okay on the validation dataset (without any fine-tuning). If you also get 0 precision and recall here I would redownload the model and the fine-tuning code. If however this model performs okay on the validation dataset, I would try to fine-tune the model for a few steps, and test this slightly fine-tuned model on the validation data again. If the model now performs worse than with no fine-tuning, there is likely an issue with the data you are fine-tuning on. Please update this thread with your debugging progress, and I will make sure to give a quicker response than this time!