PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

对PGNet算法提供的模型进行评估,效果没有达到公布的指标

Open fengxiaoru opened this issue 2 years ago • 3 comments

环境信息:

image

公开的指标信息:

链接:https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/algorithm_e2e_pgnet.md#%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83%E3%80%81%E8%AF%84%E4%BC%B0%E3%80%81%E6%8E%A8%E7%90%86 image

对上图下载链接中的模型en_server_pgnetA进行评估,打印的log信息如下:

[2022/09/14 14:39:37] ppocr INFO: Architecture : [2022/09/14 14:39:37] ppocr INFO: Backbone : [2022/09/14 14:39:37] ppocr INFO: layers : 50 [2022/09/14 14:39:37] ppocr INFO: name : ResNet [2022/09/14 14:39:37] ppocr INFO: Head : [2022/09/14 14:39:37] ppocr INFO: name : PGHead [2022/09/14 14:39:37] ppocr INFO: Neck : [2022/09/14 14:39:37] ppocr INFO: name : PGFPN [2022/09/14 14:39:37] ppocr INFO: Transform : None [2022/09/14 14:39:37] ppocr INFO: algorithm : PGNet [2022/09/14 14:39:37] ppocr INFO: model_type : e2e [2022/09/14 14:39:37] ppocr INFO: Eval : [2022/09/14 14:39:37] ppocr INFO: dataset : [2022/09/14 14:39:37] ppocr INFO: data_dir : ./train_data/total_text/test [2022/09/14 14:39:37] ppocr INFO: label_file_list : ['./train_data/total_text/test/test.txt'] [2022/09/14 14:39:37] ppocr INFO: name : PGDataSet [2022/09/14 14:39:37] ppocr INFO: transforms : [2022/09/14 14:39:37] ppocr INFO: DecodeImage : [2022/09/14 14:39:37] ppocr INFO: channel_first : False [2022/09/14 14:39:37] ppocr INFO: img_mode : BGR [2022/09/14 14:39:37] ppocr INFO: E2ELabelEncodeTest : None [2022/09/14 14:39:37] ppocr INFO: E2EResizeForTest : [2022/09/14 14:39:37] ppocr INFO: max_side_len : 768 [2022/09/14 14:39:37] ppocr INFO: NormalizeImage : [2022/09/14 14:39:37] ppocr INFO: mean : [0.485, 0.456, 0.406] [2022/09/14 14:39:37] ppocr INFO: order : hwc [2022/09/14 14:39:37] ppocr INFO: scale : 1./255. [2022/09/14 14:39:37] ppocr INFO: std : [0.229, 0.224, 0.225] [2022/09/14 14:39:37] ppocr INFO: ToCHWImage : None [2022/09/14 14:39:37] ppocr INFO: KeepKeys : [2022/09/14 14:39:37] ppocr INFO: keep_keys : ['image', 'shape', 'polys', 'texts', 'ignore_tags', 'img_id'] [2022/09/14 14:39:37] ppocr INFO: loader : [2022/09/14 14:39:37] ppocr INFO: batch_size_per_card : 1 [2022/09/14 14:39:37] ppocr INFO: drop_last : False [2022/09/14 14:39:37] ppocr INFO: num_workers : 2 [2022/09/14 14:39:37] ppocr INFO: shuffle : False [2022/09/14 14:39:37] ppocr INFO: Global : [2022/09/14 14:39:37] ppocr INFO: cal_metric_during_train : False [2022/09/14 14:39:37] ppocr INFO: character_dict_path : ppocr/utils/ic15_dict.txt [2022/09/14 14:39:37] ppocr INFO: character_type : EN [2022/09/14 14:39:37] ppocr INFO: checkpoints : pretrain_models/en_server_pgnetA/best_accuracy [2022/09/14 14:39:37] ppocr INFO: distributed : False [2022/09/14 14:39:37] ppocr INFO: epoch_num : 600 [2022/09/14 14:39:37] ppocr INFO: eval_batch_step : [1000, 1000] [2022/09/14 14:39:37] ppocr INFO: infer_img : None [2022/09/14 14:39:37] ppocr INFO: log_smooth_window : 20 [2022/09/14 14:39:37] ppocr INFO: max_text_length : 50 [2022/09/14 14:39:37] ppocr INFO: max_text_nums : 30 [2022/09/14 14:39:37] ppocr INFO: pretrained_model : None [2022/09/14 14:39:37] ppocr INFO: print_batch_step : 10 [2022/09/14 14:39:37] ppocr INFO: save_epoch_step : 20 [2022/09/14 14:39:37] ppocr INFO: save_inference_dir : None [2022/09/14 14:39:37] ppocr INFO: save_model_dir : ./output/pgnet_r50_vd_totaltext/ [2022/09/14 14:39:37] ppocr INFO: save_res_path : ./output/pgnet_r50_vd_totaltext/predicts_pgnet.txt [2022/09/14 14:39:37] ppocr INFO: tcl_len : 64 [2022/09/14 14:39:37] ppocr INFO: use_gpu : False [2022/09/14 14:39:37] ppocr INFO: use_visualdl : True [2022/09/14 14:39:37] ppocr INFO: valid_set : totaltext [2022/09/14 14:39:37] ppocr INFO: Loss : [2022/09/14 14:39:37] ppocr INFO: max_text_length : 50 [2022/09/14 14:39:37] ppocr INFO: max_text_nums : 30 [2022/09/14 14:39:37] ppocr INFO: name : PGLoss [2022/09/14 14:39:37] ppocr INFO: pad_num : 36 [2022/09/14 14:39:37] ppocr INFO: tcl_bs : 64 [2022/09/14 14:39:37] ppocr INFO: Metric : [2022/09/14 14:39:37] ppocr INFO: character_dict_path : ppocr/utils/ic15_dict.txt [2022/09/14 14:39:37] ppocr INFO: gt_mat_dir : ./train_data/total_text/gt [2022/09/14 14:39:37] ppocr INFO: main_indicator : f_score_e2e [2022/09/14 14:39:37] ppocr INFO: mode : A [2022/09/14 14:39:37] ppocr INFO: name : E2EMetric [2022/09/14 14:39:37] ppocr INFO: Optimizer : [2022/09/14 14:39:37] ppocr INFO: beta1 : 0.9 [2022/09/14 14:39:37] ppocr INFO: beta2 : 0.999 [2022/09/14 14:39:37] ppocr INFO: lr : [2022/09/14 14:39:37] ppocr INFO: learning_rate : 0.001 [2022/09/14 14:39:37] ppocr INFO: name : Adam [2022/09/14 14:39:37] ppocr INFO: regularizer : [2022/09/14 14:39:37] ppocr INFO: factor : 0 [2022/09/14 14:39:37] ppocr INFO: name : L2 [2022/09/14 14:39:37] ppocr INFO: PostProcess : [2022/09/14 14:39:37] ppocr INFO: mode : slow [2022/09/14 14:39:37] ppocr INFO: name : PGPostProcess [2022/09/14 14:39:37] ppocr INFO: score_thresh : 0.5 [2022/09/14 14:39:37] ppocr INFO: Train : [2022/09/14 14:39:37] ppocr INFO: dataset : [2022/09/14 14:39:37] ppocr INFO: data_dir : ./train_data/total_text/train [2022/09/14 14:39:37] ppocr INFO: label_file_list : ['./train_data/total_text/train/train.txt'] [2022/09/14 14:39:37] ppocr INFO: name : PGDataSet [2022/09/14 14:39:37] ppocr INFO: ratio_list : [1.0] [2022/09/14 14:39:37] ppocr INFO: transforms : [2022/09/14 14:39:37] ppocr INFO: DecodeImage : [2022/09/14 14:39:37] ppocr INFO: channel_first : False [2022/09/14 14:39:37] ppocr INFO: img_mode : BGR [2022/09/14 14:39:37] ppocr INFO: E2ELabelEncodeTrain : None [2022/09/14 14:39:37] ppocr INFO: PGProcessTrain : [2022/09/14 14:39:37] ppocr INFO: batch_size : 4 [2022/09/14 14:39:37] ppocr INFO: max_text_size : 512 [2022/09/14 14:39:37] ppocr INFO: min_crop_size : 24 [2022/09/14 14:39:37] ppocr INFO: min_text_size : 4 [2022/09/14 14:39:37] ppocr INFO: KeepKeys : [2022/09/14 14:39:37] ppocr INFO: keep_keys : ['images', 'tcl_maps', 'tcl_label_maps', 'border_maps', 'direction_maps', 'training_masks', 'label_list', 'pos_list', 'pos_mask'] [2022/09/14 14:39:37] ppocr INFO: loader : [2022/09/14 14:39:37] ppocr INFO: batch_size_per_card : 4 [2022/09/14 14:39:37] ppocr INFO: drop_last : True [2022/09/14 14:39:37] ppocr INFO: num_workers : 2 [2022/09/14 14:39:37] ppocr INFO: shuffle : True [2022/09/14 14:39:37] ppocr INFO: profiler_options : None [2022/09/14 14:39:37] ppocr INFO: train with paddle 2.3.2 and device Place(cpu) [2022/09/14 14:39:37] ppocr INFO: Initialize indexs of datasets:['./train_data/total_text/test/test.txt'] [2022/09/14 14:39:40] ppocr INFO: resume from pretrain_models/en_server_pgnetA/best_accuracy [2022/09/14 14:39:40] ppocr INFO: metric in ckpt *************** [2022/09/14 14:39:40] ppocr INFO: f_score:0.7829733997188428 [2022/09/14 14:39:40] ppocr INFO: total_num_gt:2543 [2022/09/14 14:39:40] ppocr INFO: seqerr:0.3176906452608811 [2022/09/14 14:39:40] ppocr INFO: recall_e2e:0.521431380259536 [2022/09/14 14:39:40] ppocr INFO: f_score_e2e:0.5293413173652695 [2022/09/14 14:39:40] ppocr INFO: total_num_det:2467 [2022/09/14 14:39:40] ppocr INFO: precision:0.8026753141467351 [2022/09/14 14:39:40] ppocr INFO: recall:0.7642154935115983 [2022/09/14 14:39:40] ppocr INFO: global_accumulative_recall:1943.3999999999946 [2022/09/14 14:39:40] ppocr INFO: fps:10.138212364362793 [2022/09/14 14:39:40] ppocr INFO: precision_e2e:0.5374949331171464 [2022/09/14 14:39:40] ppocr INFO: best_epoch:448 [2022/09/14 14:39:40] ppocr INFO: hit_str_count:1326 [2022/09/14 14:39:40] ppocr INFO: start_epoch:451 [2022/09/14 14:39:40] ppocr INFO: is_float16:False eval model:: 100%|████████████████████████████| 300/300 [30:45<00:00, 5.64s/it] [2022/09/14 15:10:26] ppocr INFO: metric eval *************** [2022/09/14 15:10:26] ppocr INFO: total_num_gt:2543 [2022/09/14 15:10:26] ppocr INFO: total_num_det:2473 [2022/09/14 15:10:26] ppocr INFO: global_accumulative_recall:1960.5999999999951 [2022/09/14 15:10:26] ppocr INFO: hit_str_count:1342 [2022/09/14 15:10:26] ppocr INFO: recall:0.7709791584742411 [2022/09/14 15:10:26] ppocr INFO: precision:0.8086534573392629 [2022/09/14 15:10:26] ppocr INFO: f_score:0.7893670411656245 [2022/09/14 15:10:26] ppocr INFO: seqerr:0.31551565847189467 [2022/09/14 15:10:26] ppocr INFO: recall_e2e:0.5277231616201337 [2022/09/14 15:10:26] ppocr INFO: precision_e2e:0.542660735948241 [2022/09/14 15:10:26] ppocr INFO: f_score_e2e:0.5350877192982456 [2022/09/14 15:10:26] ppocr INFO: fps:0.1657851570315855

原因是什么呀?

fengxiaoru avatar Sep 14 '22 08:09 fengxiaoru

readme中公开的论文指标需要用B模式的metric计算方式,A模式的计算方式的标签格式和PPOCR格式相同,但是效果差一些

采用B模式精度评估方式:

下载ground truth :
wget https://paddleocr.bj.bcebos.com/dataset/Groundtruth.tar
修改metric部分参数,
Metric:
  name: E2EMetric
  mode: B   # two ways for eval, A: label from txt,  B: label from gt_mat
  gt_mat_dir:  ./train_data/Groundtruth/  # the dir of gt_mat
  character_dict_path: ppocr/utils/ic15_dict.txt
  main_indicator: f_score_e2e

最后评估出来的指标:

[2022/09/15 02:35:36] ppocr INFO: load pretrain successful from ./en_server_pgnetA/best_accuracy
eval model:: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:15<00:00,  3.97it/s]
[2022/09/15 02:36:52] ppocr INFO: metric eval ***************
[2022/09/15 02:36:52] ppocr INFO: total_num_gt:2204
[2022/09/15 02:36:52] ppocr INFO: total_num_det:2070
[2022/09/15 02:36:52] ppocr INFO: global_accumulative_recall:1818.3999999999967
[2022/09/15 02:36:52] ppocr INFO: hit_str_count:1267
[2022/09/15 02:36:52] ppocr INFO: recall:0.8250453720508152
[2022/09/15 02:36:52] ppocr INFO: precision:0.8749758454106266
[2022/09/15 02:36:52] ppocr INFO: f_score:0.8492773672439888
[2022/09/15 02:36:52] ppocr INFO: seqerr:0.30323361196656273
[2022/09/15 02:36:52] ppocr INFO: recall_e2e:0.5748638838475499
[2022/09/15 02:36:52] ppocr INFO: precision_e2e:0.6120772946859904
[2022/09/15 02:36:52] ppocr INFO: f_score_e2e:0.5928872250818905
[2022/09/15 02:36:52] ppocr INFO: fps:20.85154714822483

LDOUBLEV avatar Sep 15 '22 02:09 LDOUBLEV

好的,谢谢。 再请问一下PGNet使用的预训练数据集是什么呢? image

fengxiaoru avatar Sep 18 '22 12:09 fengxiaoru

第二阶段的训练数据是synthtexk150k_irregular,synthtexk150k_curved,ArTV2,Total-tex 数据配比分别是 [0.0023, 0.0070, 0.1653, 0.8254]

LDOUBLEV avatar Sep 19 '22 01:09 LDOUBLEV

第二阶段的训练数据是synthtexk150k_irregular,synthtexk150k_curved,ArTV2,Total-tex 数据配比分别是 [0.0023, 0.0070, 0.1653, 0.8254]

请问第一阶段用的训练数据是什么呢?

Whittaker0323 avatar May 23 '23 10:05 Whittaker0323

你好,请问问题解决了吗,我采用了B模式精度也达不到

lulei0926 avatar Sep 18 '23 08:09 lulei0926