PaddleOCR copied to clipboard
SVTR识别模型在转为推理模型之前,在数据集上测试的结果很好,但是通过export_model转换为inference模型之后,运行predict_rec.py进行识别,识别模型效果就差了许多,不知道这是为什么 paddle版本是2.4,cuda版本为11.3,cudnn版本为8.2
P.S. 在predict_rec命令后面加速 --use_tensorrt=True,也就是使用tensorrt加速推理,模型的输出变得更加奇怪了,基本上不正确,但是使用其他模型比如PPOCR v3中的识别模型就没问题
Global: use_gpu: True epoch_num: 5000 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec/rec_svtr_large_en/ save_epoch_step: 10 eval_batch_step: [20000, 2000] cal_metric_during_train: True pretrained_model: ./pretrain_models/rec_svtr_large_none_ctc_en_train/best_accuracy checkpoints: save_inference_dir: use_visualdl: False infer_img: doc/imgs_words_en/word_10.png character_dict_path: ./ppocr/utils/my_dict.txt character_type: en max_text_length: 25 infer_mode: False use_space_char: False save_res_path: ./output/rec/predicts_svtr_large.txt
Optimizer: name: AdamW beta1: 0.9 beta2: 0.99 epsilon: 0.00000008 weight_decay: 0.05 no_weight_decay_name: norm pos_embed one_dim_param_no_weight_decay: true lr: name: Cosine learning_rate: 0.000065 warmup_epoch: 100
Architecture: model_type: rec algorithm: SVTR Transform: name: STN_ON tps_inputsize: [32, 64] tps_outputsize: [48, 160] num_control_points: 20 tps_margins: [0.05,0.05] stn_activation: none Backbone: name: SVTRNet img_size: [48, 160] out_char_num: 40 out_channels: 384 patch_merging: 'Conv' embed_dim: [192, 256, 512] depth: [3, 9, 9] num_heads: [6, 8, 16] mixer: ['Local','Local','Local','Local','Local','Local','Local','Local','Local','Local','Global','Global','Global','Global','Global','Global','Global','Global','Global','Global','Global'] local_mixer: [[7, 11], [7, 11], [7, 11]] prenorm: false Neck: name: SequenceEncoder encoder_type: reshape Head: name: CTCHead
Loss: name: CTCLoss
PostProcess: name: CTCLabelDecode # SVTRLabelDecode is used for eval after train, please change to CTCLabelDecode when training
Metric: name: RecMetric main_indicator: acc
Train: dataset: name: LMDBDataSet data_dir: ./OCR_dataset/lmdb/train transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - RecAug: - CTCLabelEncode: # Class handling label - RecResizeImg: character_dict_path: image_shape: [3, 64, 256] padding: False - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: True batch_size_per_card: 64 drop_last: True num_workers: 4
Eval: dataset: name: LMDBDataSet data_dir: ./OCR_dataset_test_lmdb transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - CTCLabelEncode: # Class handling label - SVTRRecResizeImg: # SVTRRecResizeImg is used for eval after train, please change to RecResizeImg when training character_dict_path: image_shape: [3, 64, 256] padding: False - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: False drop_last: False batch_size_per_card: 64 num_workers: 2
python tools/infer/ --image_dir="./OCR_dataset_test/crop_img/" --rec_model_dir="./inference/rec_svtr_large_stn_en/" --use_angle_cls=false --rec_char_dict_path="./ppocr/utils/my_dict.txt" --rec_image_shape="3,64,256"
I'm experiencing this exact problem as well, using this docker container paddlepaddle/paddle:2.3.2-gpu-cuda11.2-cudnn8
The output of tools/ when running my trained model on a test set is almost perfect.
The output of tools/infer/ when running the exported inference model on a test set is slightly off (outputs the wrong, but similar looking letters, random repetition of a character at the start or end of a string). Further testing confirms its almost half of the accuracy of the trained model.
Input image size are set the same for both scripts, as well as char dict.
@dual19 Yeah, I have the same question like you. Do you use the SVTR rec algorithm? And do you use the tensorrt to speed up the inference? I find that this will get the worse results.
在预测时加上识别算法,也就是 --rec_algorithm="SVTR",模型推理和之前一样了
python tools/infer/ --image_dir="./OCR_dataset_test/crop_img/" --rec_model_dir="./inference/rec_svtr_large_stn_en/" --use_angle_cls=false --rec_char_dict_path="./ppocr/utils/tgb_dict.txt" --rec_image_shape="3,64,256" --rec_algorithm='SVTR' --use_tensorrt=True
tensorrt的版本是8.0.3.4的,paddle inference从下面的链接里面下载进行安装:
@dual19 I face the same issue when converting my training model to inference for PPOCRv3 model. Accuracy drops for some reason when using exported reason
Hi, I actually face the same issue too. I trained the model of SVTR and used tools/ and tools/infer/ to test the same dataset and the latter one has a huge accuracy decrease. I debug the code and find the function of resize_norm_img_svtr in is different from the function resize_norm_img I used while training. the former one resize the image to 3,32,320 directly and the latter will keep the wh ratio using padding to keep the img with 3,32,320. When I change this function, the performance from two methods are the same. I hope this will help you~
@chengchenng Do you use tensorrt to speed up the rec algorithm? I got some bad results when using it.
@chengchenng I noticed a difference between the resize function from tools/ and the eval script.
I had to set the infer_mode = False during the ops creation to match the accuracy for the training checkpoint between those two scripts.
I will check out the change you recommended for the inference model to see if that resolves this issue.
And no I haven't tested the tensorrt performance either
输出结果为:('210', 0.22712290287017822)
不使用tensorrt加速的结果为:('G39', 0.9999995231628418
输出结果为:('918', 0.15482832491397858)
不使用tensorrt加速的结果为:('G14', 0.9999997615814209)
输出结果为:('210', 0.22712290287017822) 不使用tensorrt加速的结果为:('G39', 0.9999995231628418
输出结果为:('918', 0.15482832491397858) 不使用tensorrt加速的结果为:('G14', 0.9999997615814209)
感觉这样看起来肯定是哪里有问题。。 不过我也还没有转。。 之后如果转了没问题再和你说哈
@Topdu 请问一下有示例的代码吗?
use_gpu: True
epoch_num: 20
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/svtr/
save_epoch_step: 1
# evaluation is run every 2000 iterations after the 0th iteration
eval_batch_step: [0, 2000]
cal_metric_during_train: True
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_type: en
max_text_length: 25
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_svtr_tiny.txt
name: AdamW
beta1: 0.9
beta2: 0.99
epsilon: 8.e-8
weight_decay: 0.05
no_weight_decay_name: norm pos_embed
one_dim_param_no_weight_decay: true
name: Cosine
learning_rate: 0.0005
warmup_epoch: 2
model_type: rec
algorithm: SVTR
name: SVTRNet
img_size: [32, 100]
out_char_num: 25
out_channels: 192
patch_merging: 'Conv'
embed_dim: [64, 128, 256]
depth: [3, 6, 3]
num_heads: [2, 4, 8]
mixer: ['Local','Local','Local','Local','Local','Local','Global','Global','Global','Global','Global','Global']
local_mixer: [[7, 11], [7, 11], [7, 11]]
last_stage: True
prenorm: false
name: SequenceEncoder
encoder_type: reshape
name: CTCHead
name: CTCLoss
name: CTCLabelDecode
name: RecMetric
main_indicator: acc
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/training/
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 100]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
shuffle: True
batch_size_per_card: 512
drop_last: True
num_workers: 4
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/evaluation/
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 100]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 2
To follow up from @chengchenng comment, I've noticed two issues with the current ppocr issue.
- The call method in checks to see if the algorithm matches "SVTR" to determine the resizing function. Currently my rec_algorithm was set to "SVTR_LCNet", so changing this line to
elif "SVTR" in self.rec_algorithm
should resolve this issue. - After resolving the previous issue, accuracy improved but was still 75% compared to the 87% I was seeing in the script. To match, I had to use this resizing function. So after copying this inplace of the resize_norm_img_svtr function, accuracy went up to 87%.
@Topdu Would you like me to create a MR for these changes? Also why is the resizing function based on infer_mode based on infer_mode when it appears that one function is better suited for chinese characters?
您好 程铖已经收到您的邮件