PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

Training PP-OCRv3 on one-word images

Open bely66 opened this issue 2 years ago • 2 comments

Hi Everyone, I'm training the ppocrv3 recognition for 9M images dataset with various fonts and augmentations.

What I'm looking for is faster training and higher accuracy And my dataset images only consists of one word images instead of sentences.

So what I'm thinking is

  1. Reducing the input image size to (32, 160) instead of (48, 320)
  2. Removing the RecConAug as it adds longer or reduce the max word char to about 16

My accuracy currently is about 83% after 16 epochs and it has taken almost 5 days to complete on 2xV100 compute. The dataset is pretty straight forward and easily get above 95% accuracy on models like vanilla CRNN and SAR

So would this strategy affect accuracy negatively? and is there any other way to optimize training time because it's very costly to train 200 epochs for 50 days?

bely66 avatar Oct 30 '22 09:10 bely66

Sorry for the late reply,you can download PP-OCRv3 pretrain model to fine-tune, please refer to this document

an1018 avatar Nov 01 '22 14:11 an1018

@an1018 Hi thanks for your reply i won’t be able to finetune because the language and dictionary I’m using is custom and new

my issues are training from scratch for a long time

bely66 avatar Nov 01 '22 14:11 bely66

Yes, if your training data is very simple, the v3 strategy may have a negative impact, you can try to remove the GTC strategy, refer to https://github.com/PaddlePaddle/PaddleOCR/issues/6355#issuecomment-1134356635 , and remove conaug at the same time, if you do not need additional data enhancement.

tink2123 avatar Nov 30 '22 07:11 tink2123

@bely66 Did you get a solution to this? I'm also Trying to train for a new language with new dict(I added characters to an English dict). Ultimate goal is to have a bi-lingual model. I've trained on 75k one-word images for 2 days and this is what my training and eval accuracy looks like-

image

My questions are-

  1. Is the model actually learning? It seems like it is.
  2. Is this a training length issue? Or do i not have enough data? Managing more data will be very tough.
  3. What would you suggest for 95% + accuracy?
Please click to see my current config-

Global: use_gpu: true epoch_num: 10 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec_ben save_epoch_step: 3 eval_batch_step:

  • 0
  • 20 cal_metric_during_train: true pretrained_model: null checkpoints: null save_inference_dir: /content/inference use_visualdl: false infer_img: character_dict_path: /content/bn_dict.txt max_text_length: 25 infer_mode: false use_space_char: true Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 regularizer: name: L2 factor: 1.0e-05 Architecture: model_type: rec algorithm: CRNN Transform: null Backbone: name: MobileNetV3 scale: 0.5 model_name: small small_stride:
    • 1
    • 2
    • 2
    • 2 Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 48 Head: name: CTCHead fc_decay: 1.0e-05 Loss: name: CTCLoss PostProcess: name: CTCLabelDecode Metric: name: RecMetric main_indicator: acc Train: dataset: name: SimpleDataSet data_dir: /content/content/final_dataset/rec/train/train label_file_list:
    • /content/content/final_dataset/rec/train/rec_gt_train.txt transforms:
    • DecodeImage: img_mode: BGR channel_first: false
    • RecAug: null
    • CTCLabelEncode: null
    • RecResizeImg: image_shape:
      • 3
      • 32
      • 320
    • KeepKeys: keep_keys:
      • image
      • label
      • length loader: shuffle: true batch_size_per_card: 256 drop_last: true num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: /content/content/final_dataset/rec/test/test label_file_list:
    • /content/content/final_dataset/rec/test/rec_gt_test.txt transforms:
    • DecodeImage: img_mode: BGR channel_first: false
    • CTCLabelEncode: null
    • RecResizeImg: image_shape:
      • 3
      • 32
      • 320
    • KeepKeys: keep_keys:
      • image
      • label
      • length loader: shuffle: false drop_last: false batch_size_per_card: 256 num_workers: 8 wandb: project: paddle-ocr-long

rm-asif-amin avatar Mar 25 '24 14:03 rm-asif-amin