PaddleOCR
PaddleOCR copied to clipboard
en_PP-OCRv3识别finetune,infer_rec.py结果与eval.py不一致
在自建数据集上(~3000张)上finetune识别模型,因为label里没有空格,把use_space_char设为false了,调整了学习率。
训练过程正常收敛,训练后
ppocr INFO: best metric, acc: 0.9619047390022681, norm_edit_dis: 0.9958383091069015, fps: 2630.607981984834, best_epoch: 373
但使用infer_rec.py发现识别结果中有大量多余字符。
请问是否有配置不正确之处或是存在参数不对齐吗? 另外对于识别部分的finetune来说,数据量要达到多少比较合适?
- 系统环境/System Environment:linux
- 版本号/Version:Paddle: PaddleOCR:release/2.5
- 训练配置文件 ` Global: debug: false use_gpu: true epoch_num: 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/ppocr_v3_rec save_epoch_step: 100 eval_batch_step: [0, 100] cal_metric_during_train: true pretrained_model: ./models/en_PP-OCRv3_rec_train/best_accuracy checkpoints: save_inference_dir: use_visualdl: True infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ppocr/utils/en_dict.txt max_text_length: &max_text_length 25 infer_mode: false use_space_char: false distributed: true save_res_path: ./output/rec/predicts_ppocrv3_finetune.txt
Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Const learning_rate: 0.00005 warmup_epoch: 0 regularizer: name: L2 factor: 3.0e-05
Architecture: model_type: rec algorithm: SVTR Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 last_conv_stride: [1, 2] last_pool_type: avg Head: name: MultiHead head_list: - CTCHead: Neck: name: svtr dims: 64 depth: 2 hidden_dims: 120 use_guide: True Head: fc_decay: 0.00001 - SARHead: enc_dim: 512 max_text_length: *max_text_length
Loss: name: MultiLoss loss_config_list: - CTCLoss: - SARLoss:
PostProcess:
name: CTCLabelDecode
Metric: name: RecMetric main_indicator: acc ignore_space: False
Train: dataset: name: SimpleDataSet data_dir: ./data/ ext_op_transform_idx: 1 label_file_list: - ./data/rec/rec_train.txt ratio_list: [1.0] transforms: - DecodeImage: img_mode: BGR channel_first: false - RecConAug: prob: 0.5 ext_data_num: 2 image_shape: [48, 320, 3] - RecAug: - MultiLabelEncode: - RecResizeImg: image_shape: [3, 48, 320] - KeepKeys: keep_keys: - image - label_ctc - label_sar - length - valid_ratio loader: shuffle: true batch_size_per_card: 36 drop_last: true num_workers: 4 Eval: dataset: name: SimpleDataSet data_dir: ./data/ label_file_list: - ./data/rec/rec_test.txt ratio_list: [1.0] transforms: - DecodeImage: img_mode: BGR channel_first: false - MultiLabelEncode: - RecResizeImg: image_shape: [3, 48, 320] - KeepKeys: keep_keys: - image - label_ctc - label_sar - length - valid_ratio loader: shuffle: false drop_last: false batch_size_per_card: 36 num_workers: 4
`
对比best_accuracy模型在eval.py 和infer_rec.py的预测结果,发现差异很大。 eval.py是正常的,但是infer_rec.py很混乱,是什么参数没对齐呢?
差距在于 infer_rec.py中调用的RecResizeImg走的是resize_norm_img_chinese,走resize_norm_img的话,两边表现一致了。。 这个应该怎么在配置中体现?
https://github.com/PaddlePaddle/PaddleOCR/blob/bde8cad09668939a94fde066ffb556a703d7f7ca/ppocr/data/imaug/rec_img_aug.py#L143-L149
配置中暂时无法修改,默认在infer逻辑中,图片在宽度上不做压缩。 这个问题可能是因为训练图片大多较长,resize到32,320后有一定形变,模型学习到了压缩后的特征。
如果当前模型已满足需求,可以修改上述 PaddleOCR/ppocr/data/imaug/rec_img_aug.py 逻辑,走resize_norm_img逻辑。