PaddleOCR
PaddleOCR copied to clipboard
Training PP-OCRv3 on one-word images
Hi Everyone, I'm training the ppocrv3 recognition for 9M images dataset with various fonts and augmentations.
What I'm looking for is faster training and higher accuracy And my dataset images only consists of one word images instead of sentences.
So what I'm thinking is
- Reducing the input image size to (32, 160) instead of (48, 320)
- Removing the RecConAug as it adds longer or reduce the max word char to about 16
My accuracy currently is about 83% after 16 epochs and it has taken almost 5 days to complete on 2xV100 compute. The dataset is pretty straight forward and easily get above 95% accuracy on models like vanilla CRNN and SAR
So would this strategy affect accuracy negatively? and is there any other way to optimize training time because it's very costly to train 200 epochs for 50 days?
Sorry for the late reply,you can download PP-OCRv3 pretrain model to fine-tune, please refer to this document
@an1018 Hi thanks for your reply i won’t be able to finetune because the language and dictionary I’m using is custom and new
my issues are training from scratch for a long time
Yes, if your training data is very simple, the v3 strategy may have a negative impact, you can try to remove the GTC strategy, refer to https://github.com/PaddlePaddle/PaddleOCR/issues/6355#issuecomment-1134356635 , and remove conaug at the same time, if you do not need additional data enhancement.
@bely66 Did you get a solution to this? I'm also Trying to train for a new language with new dict(I added characters to an English dict). Ultimate goal is to have a bi-lingual model. I've trained on 75k one-word images for 2 days and this is what my training and eval accuracy looks like-
My questions are-
- Is the model actually learning? It seems like it is.
- Is this a training length issue? Or do i not have enough data? Managing more data will be very tough.
- What would you suggest for 95% + accuracy?
Please click to see my current config-
Global: use_gpu: true epoch_num: 10 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec_ben save_epoch_step: 3 eval_batch_step:
- 0
- 20
cal_metric_during_train: true
pretrained_model: null
checkpoints: null
save_inference_dir: /content/inference
use_visualdl: false
infer_img:
character_dict_path: /content/bn_dict.txt
max_text_length: 25
infer_mode: false
use_space_char: true
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
regularizer:
name: L2
factor: 1.0e-05
Architecture:
model_type: rec
algorithm: CRNN
Transform: null
Backbone:
name: MobileNetV3
scale: 0.5
model_name: small
small_stride:
- 1
- 2
- 2
- 2 Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 48 Head: name: CTCHead fc_decay: 1.0e-05 Loss: name: CTCLoss PostProcess: name: CTCLabelDecode Metric: name: RecMetric main_indicator: acc Train: dataset: name: SimpleDataSet data_dir: /content/content/final_dataset/rec/train/train label_file_list:
- /content/content/final_dataset/rec/train/rec_gt_train.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- RecAug: null
- CTCLabelEncode: null
- RecResizeImg:
image_shape:
- 3
- 32
- 320
- KeepKeys:
keep_keys:
- image
- label
- length loader: shuffle: true batch_size_per_card: 256 drop_last: true num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: /content/content/final_dataset/rec/test/test label_file_list:
- /content/content/final_dataset/rec/test/rec_gt_test.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- CTCLabelEncode: null
- RecResizeImg:
image_shape:
- 3
- 32
- 320
- KeepKeys:
keep_keys:
- image
- label
- length loader: shuffle: false drop_last: false batch_size_per_card: 256 num_workers: 8 wandb: project: paddle-ocr-long