PaddleOCR Training PP-OCRv3 on one-word images

Hi Everyone, I'm training the ppocrv3 recognition for 9M images dataset with various fonts and augmentations.

What I'm looking for is faster training and higher accuracy And my dataset images only consists of one word images instead of sentences.

So what I'm thinking is

Reducing the input image size to (32, 160) instead of (48, 320)
Removing the RecConAug as it adds longer or reduce the max word char to about 16

My accuracy currently is about 83% after 16 epochs and it has taken almost 5 days to complete on 2xV100 compute. The dataset is pretty straight forward and easily get above 95% accuracy on models like vanilla CRNN and SAR

So would this strategy affect accuracy negatively? and is there any other way to optimize training time because it's very costly to train 200 epochs for 50 days?

Oct 30 '22 09:10 bely66

Sorry for the late reply，you can download PP-OCRv3 pretrain model to fine-tune, please refer to this document

Nov 01 '22 14:11 an1018

@an1018 Hi thanks for your reply i won’t be able to finetune because the language and dictionary I’m using is custom and new

my issues are training from scratch for a long time

Nov 01 '22 14:11 bely66

Yes, if your training data is very simple, the v3 strategy may have a negative impact, you can try to remove the GTC strategy, refer to https://github.com/PaddlePaddle/PaddleOCR/issues/6355#issuecomment-1134356635 , and remove conaug at the same time, if you do not need additional data enhancement.

Nov 30 '22 07:11 tink2123

@bely66 Did you get a solution to this? I'm also Trying to train for a new language with new dict(I added characters to an English dict). Ultimate goal is to have a bi-lingual model. I've trained on 75k one-word images for 2 days and this is what my training and eval accuracy looks like-

My questions are-

Is the model actually learning? It seems like it is.
Is this a training length issue? Or do i not have enough data? Managing more data will be very tough.
What would you suggest for 95% + accuracy?

Please click to see my current config-

Global: use_gpu: true epoch_num: 10 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec_ben save_epoch_step: 3 eval_batch_step:

0
20 cal_metric_during_train: true pretrained_model: null checkpoints: null save_inference_dir: /content/inference use_visualdl: false infer_img: character_dict_path: /content/bn_dict.txt max_text_length: 25 infer_mode: false use_space_char: true Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 regularizer: name: L2 factor: 1.0e-05 Architecture: model_type: rec algorithm: CRNN Transform: null Backbone: name: MobileNetV3 scale: 0.5 model_name: small small_stride:
- 1
- 2
- 2
- 2 Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 48 Head: name: CTCHead fc_decay: 1.0e-05 Loss: name: CTCLoss PostProcess: name: CTCLabelDecode Metric: name: RecMetric main_indicator: acc Train: dataset: name: SimpleDataSet data_dir: /content/content/final_dataset/rec/train/train label_file_list:
- /content/content/final_dataset/rec/train/rec_gt_train.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- RecAug: null
- CTCLabelEncode: null
- RecResizeImg: image_shape:
  - 3
  - 32
  - 320
- KeepKeys: keep_keys:
  - image
  - label
  - length loader: shuffle: true batch_size_per_card: 256 drop_last: true num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: /content/content/final_dataset/rec/test/test label_file_list:
- /content/content/final_dataset/rec/test/rec_gt_test.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- CTCLabelEncode: null
- RecResizeImg: image_shape:
  - 3
  - 32
  - 320
- KeepKeys: keep_keys:
  - image
  - label
  - length loader: shuffle: false drop_last: false batch_size_per_card: 256 num_workers: 8 wandb: project: paddle-ocr-long

Mar 25 '24 14:03 rm-asif-amin

PaddleOCR PaddleOCR copied to clipboard

Training PP-OCRv3 on one-word images

PaddleOCR
PaddleOCR copied to clipboard