PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

OCRv3_det_cml训练出现Found inf or nan, current scale is: 1024.0, decrease to: 1024.0*0.5

Open Seylena opened this issue 2 years ago • 0 comments

配置文件:configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml 具体配置信息打印: [2022/10/21 22:33:35] ppocr INFO: Architecture : [2022/10/21 22:33:35] ppocr INFO: Models : [2022/10/21 22:33:35] ppocr INFO: Student : [2022/10/21 22:33:35] ppocr INFO: Backbone : [2022/10/21 22:33:35] ppocr INFO: disable_se : True [2022/10/21 22:33:35] ppocr INFO: model_name : large [2022/10/21 22:33:35] ppocr INFO: name : MobileNetV3 [2022/10/21 22:33:35] ppocr INFO: scale : 0.5 [2022/10/21 22:33:35] ppocr INFO: Head : [2022/10/21 22:33:35] ppocr INFO: k : 50 [2022/10/21 22:33:35] ppocr INFO: name : DBHead [2022/10/21 22:33:35] ppocr INFO: Neck : [2022/10/21 22:33:35] ppocr INFO: name : RSEFPN [2022/10/21 22:33:35] ppocr INFO: out_channels : 96 [2022/10/21 22:33:35] ppocr INFO: shortcut : True [2022/10/21 22:33:35] ppocr INFO: Transform : None [2022/10/21 22:33:35] ppocr INFO: algorithm : DB [2022/10/21 22:33:35] ppocr INFO: model_type : det [2022/10/21 22:33:35] ppocr INFO: pretrained : None [2022/10/21 22:33:35] ppocr INFO: Student2 : [2022/10/21 22:33:35] ppocr INFO: Backbone : [2022/10/21 22:33:35] ppocr INFO: disable_se : True [2022/10/21 22:33:35] ppocr INFO: model_name : large [2022/10/21 22:33:35] ppocr INFO: name : MobileNetV3 [2022/10/21 22:33:35] ppocr INFO: scale : 0.5 [2022/10/21 22:33:35] ppocr INFO: Head : [2022/10/21 22:33:35] ppocr INFO: k : 50 [2022/10/21 22:33:35] ppocr INFO: name : DBHead [2022/10/21 22:33:35] ppocr INFO: Neck : [2022/10/21 22:33:35] ppocr INFO: name : RSEFPN [2022/10/21 22:33:35] ppocr INFO: out_channels : 96 [2022/10/21 22:33:35] ppocr INFO: shortcut : True [2022/10/21 22:33:35] ppocr INFO: Transform : None [2022/10/21 22:33:35] ppocr INFO: algorithm : DB [2022/10/21 22:33:35] ppocr INFO: model_type : det [2022/10/21 22:33:35] ppocr INFO: pretrained : None [2022/10/21 22:33:35] ppocr INFO: Teacher : [2022/10/21 22:33:35] ppocr INFO: Backbone : [2022/10/21 22:33:35] ppocr INFO: in_channels : 3 [2022/10/21 22:33:35] ppocr INFO: layers : 50 [2022/10/21 22:33:35] ppocr INFO: name : ResNet [2022/10/21 22:33:35] ppocr INFO: Head : [2022/10/21 22:33:35] ppocr INFO: k : 50 [2022/10/21 22:33:35] ppocr INFO: kernel_list : [7, 2, 2] [2022/10/21 22:33:35] ppocr INFO: name : DBHead [2022/10/21 22:33:35] ppocr INFO: Neck : [2022/10/21 22:33:35] ppocr INFO: name : LKPAN [2022/10/21 22:33:35] ppocr INFO: out_channels : 256 [2022/10/21 22:33:35] ppocr INFO: algorithm : DB [2022/10/21 22:33:35] ppocr INFO: freeze_params : True [2022/10/21 22:33:35] ppocr INFO: model_type : det [2022/10/21 22:33:35] ppocr INFO: pretrained : None [2022/10/21 22:33:35] ppocr INFO: return_all_feats : False [2022/10/21 22:33:35] ppocr INFO: algorithm : Distillation [2022/10/21 22:33:35] ppocr INFO: model_type : det [2022/10/21 22:33:35] ppocr INFO: name : DistillationModel [2022/10/21 22:33:35] ppocr INFO: Eval : [2022/10/21 22:33:35] ppocr INFO: dataset : [2022/10/21 22:33:35] ppocr INFO: data_dir : ./data/ [2022/10/21 22:33:35] ppocr INFO: label_file_list : ['./data/val.txt'] [2022/10/21 22:33:35] ppocr INFO: name : SimpleDataSet [2022/10/21 22:33:35] ppocr INFO: transforms : [2022/10/21 22:33:35] ppocr INFO: DecodeImage : [2022/10/21 22:33:35] ppocr INFO: channel_first : False [2022/10/21 22:33:35] ppocr INFO: img_mode : BGR [2022/10/21 22:33:35] ppocr INFO: DetLabelEncode : None [2022/10/21 22:33:35] ppocr INFO: DetResizeForTest : [2022/10/21 22:33:35] ppocr INFO: resize_long : 960 [2022/10/21 22:33:35] ppocr INFO: NormalizeImage : [2022/10/21 22:33:35] ppocr INFO: mean : [0.485, 0.456, 0.406] [2022/10/21 22:33:35] ppocr INFO: order : hwc [2022/10/21 22:33:35] ppocr INFO: scale : 1./255. [2022/10/21 22:33:35] ppocr INFO: std : [0.229, 0.224, 0.225] [2022/10/21 22:33:35] ppocr INFO: ToCHWImage : None [2022/10/21 22:33:35] ppocr INFO: KeepKeys : [2022/10/21 22:33:35] ppocr INFO: keep_keys : ['image', 'shape', 'polys', 'ignore_tags'] [2022/10/21 22:33:35] ppocr INFO: loader : [2022/10/21 22:33:35] ppocr INFO: batch_size_per_card : 1 [2022/10/21 22:33:35] ppocr INFO: drop_last : False [2022/10/21 22:33:35] ppocr INFO: num_workers : 2 [2022/10/21 22:33:35] ppocr INFO: shuffle : False [2022/10/21 22:33:35] ppocr INFO: Global : [2022/10/21 22:33:35] ppocr INFO: cal_metric_during_train : False [2022/10/21 22:33:35] ppocr INFO: checkpoints : None [2022/10/21 22:33:35] ppocr INFO: debug : False [2022/10/21 22:33:35] ppocr INFO: distributed : True [2022/10/21 22:33:35] ppocr INFO: epoch_num : 150 [2022/10/21 22:33:35] ppocr INFO: eval_batch_step : [0, 400] [2022/10/21 22:33:35] ppocr INFO: infer_img : doc/imgs_en/img_10.jpg [2022/10/21 22:33:35] ppocr INFO: log_smooth_window : 20 [2022/10/21 22:33:35] ppocr INFO: pretrained_model : ./pretrain_models/en_PP-OCRv3_det_distill_train/best_accuracy [2022/10/21 22:33:35] ppocr INFO: print_batch_step : 10 [2022/10/21 22:33:35] ppocr INFO: save_epoch_step : 30 [2022/10/21 22:33:35] ppocr INFO: save_inference_dir : None [2022/10/21 22:33:35] ppocr INFO: save_model_dir : ./output/det_cml_1021/ [2022/10/21 22:33:35] ppocr INFO: save_res_path : ./checkpoints/det_db/predicts_db.txt [2022/10/21 22:33:35] ppocr INFO: scale_loss : 1024.0 [2022/10/21 22:33:35] ppocr INFO: use_amp : True [2022/10/21 22:33:35] ppocr INFO: use_dynamic_loss_scaling : True [2022/10/21 22:33:35] ppocr INFO: use_gpu : True [2022/10/21 22:33:35] ppocr INFO: use_visualdl : False [2022/10/21 22:33:35] ppocr INFO: Loss : [2022/10/21 22:33:35] ppocr INFO: loss_config_list : [2022/10/21 22:33:35] ppocr INFO: DistillationDilaDBLoss : [2022/10/21 22:33:35] ppocr INFO: alpha : 5 [2022/10/21 22:33:35] ppocr INFO: balance_loss : True [2022/10/21 22:33:35] ppocr INFO: beta : 10 [2022/10/21 22:33:35] ppocr INFO: key : maps [2022/10/21 22:33:35] ppocr INFO: main_loss_type : DiceLoss [2022/10/21 22:33:35] ppocr INFO: model_name_pairs : [['Student', 'Teacher'], ['Student2', 'Teacher']] [2022/10/21 22:33:35] ppocr INFO: ohem_ratio : 3 [2022/10/21 22:33:35] ppocr INFO: weight : 1.0 [2022/10/21 22:33:35] ppocr INFO: DistillationDMLLoss : [2022/10/21 22:33:35] ppocr INFO: key : maps [2022/10/21 22:33:35] ppocr INFO: maps_name : thrink_maps [2022/10/21 22:33:35] ppocr INFO: model_name_pairs : ['Student', 'Student2'] [2022/10/21 22:33:35] ppocr INFO: weight : 1.0 [2022/10/21 22:33:35] ppocr INFO: DistillationDBLoss : [2022/10/21 22:33:35] ppocr INFO: alpha : 5 [2022/10/21 22:33:35] ppocr INFO: balance_loss : True [2022/10/21 22:33:35] ppocr INFO: beta : 10 [2022/10/21 22:33:35] ppocr INFO: main_loss_type : DiceLoss [2022/10/21 22:33:35] ppocr INFO: model_name_list : ['Student', 'Student2'] [2022/10/21 22:33:35] ppocr INFO: ohem_ratio : 3 [2022/10/21 22:33:35] ppocr INFO: weight : 1.0 [2022/10/21 22:33:35] ppocr INFO: name : CombinedLoss [2022/10/21 22:33:35] ppocr INFO: Metric : [2022/10/21 22:33:35] ppocr INFO: base_metric_name : DetMetric [2022/10/21 22:33:35] ppocr INFO: key : Student [2022/10/21 22:33:35] ppocr INFO: main_indicator : hmean [2022/10/21 22:33:35] ppocr INFO: name : DistillationMetric [2022/10/21 22:33:35] ppocr INFO: Optimizer : [2022/10/21 22:33:35] ppocr INFO: beta1 : 0.9 [2022/10/21 22:33:35] ppocr INFO: beta2 : 0.999 [2022/10/21 22:33:35] ppocr INFO: lr : [2022/10/21 22:33:35] ppocr INFO: learning_rate : 0.0005 [2022/10/21 22:33:35] ppocr INFO: name : Cosine [2022/10/21 22:33:35] ppocr INFO: warmup_epoch : 2 [2022/10/21 22:33:35] ppocr INFO: name : Adam [2022/10/21 22:33:35] ppocr INFO: regularizer : [2022/10/21 22:33:35] ppocr INFO: factor : 5e-05 [2022/10/21 22:33:35] ppocr INFO: name : L2 [2022/10/21 22:33:35] ppocr INFO: PostProcess : [2022/10/21 22:33:35] ppocr INFO: box_thresh : 0.6 [2022/10/21 22:33:35] ppocr INFO: key : head_out [2022/10/21 22:33:35] ppocr INFO: max_candidates : 1000 [2022/10/21 22:33:35] ppocr INFO: model_name : ['Student'] [2022/10/21 22:33:35] ppocr INFO: name : DistillationDBPostProcess [2022/10/21 22:33:35] ppocr INFO: thresh : 0.3 [2022/10/21 22:33:35] ppocr INFO: unclip_ratio : 1.5 [2022/10/21 22:33:35] ppocr INFO: Train : [2022/10/21 22:33:35] ppocr INFO: dataset : [2022/10/21 22:33:35] ppocr INFO: data_dir : ./data/ [2022/10/21 22:33:35] ppocr INFO: label_file_list : ['./data/train.txt'] [2022/10/21 22:33:35] ppocr INFO: name : SimpleDataSet [2022/10/21 22:33:35] ppocr INFO: ratio_list : [1.0] [2022/10/21 22:33:35] ppocr INFO: transforms : [2022/10/21 22:33:35] ppocr INFO: DecodeImage : [2022/10/21 22:33:35] ppocr INFO: channel_first : False [2022/10/21 22:33:35] ppocr INFO: img_mode : BGR [2022/10/21 22:33:35] ppocr INFO: DetLabelEncode : None [2022/10/21 22:33:35] ppocr INFO: CopyPaste : None [2022/10/21 22:33:35] ppocr INFO: IaaAugment : [2022/10/21 22:33:35] ppocr INFO: augmenter_args : [2022/10/21 22:33:35] ppocr INFO: args : [2022/10/21 22:33:35] ppocr INFO: p : 0.5 [2022/10/21 22:33:35] ppocr INFO: type : Fliplr [2022/10/21 22:33:35] ppocr INFO: args : [2022/10/21 22:33:35] ppocr INFO: rotate : [-10, 10] [2022/10/21 22:33:35] ppocr INFO: type : Affine [2022/10/21 22:33:35] ppocr INFO: args : [2022/10/21 22:33:35] ppocr INFO: size : [0.5, 3] [2022/10/21 22:33:35] ppocr INFO: type : Resize [2022/10/21 22:33:35] ppocr INFO: EastRandomCropData : [2022/10/21 22:33:35] ppocr INFO: keep_ratio : True [2022/10/21 22:33:35] ppocr INFO: max_tries : 50 [2022/10/21 22:33:35] ppocr INFO: size : [960, 960] [2022/10/21 22:33:35] ppocr INFO: MakeBorderMap : [2022/10/21 22:33:35] ppocr INFO: shrink_ratio : 0.4 [2022/10/21 22:33:35] ppocr INFO: thresh_max : 0.7 [2022/10/21 22:33:35] ppocr INFO: thresh_min : 0.3 [2022/10/21 22:33:35] ppocr INFO: MakeShrinkMap : [2022/10/21 22:33:35] ppocr INFO: min_text_size : 8 [2022/10/21 22:33:35] ppocr INFO: shrink_ratio : 0.4 [2022/10/21 22:33:35] ppocr INFO: NormalizeImage : [2022/10/21 22:33:35] ppocr INFO: mean : [0.485, 0.456, 0.406] [2022/10/21 22:33:35] ppocr INFO: order : hwc [2022/10/21 22:33:35] ppocr INFO: scale : 1./255. [2022/10/21 22:33:35] ppocr INFO: std : [0.229, 0.224, 0.225] [2022/10/21 22:33:35] ppocr INFO: ToCHWImage : None [2022/10/21 22:33:35] ppocr INFO: KeepKeys : [2022/10/21 22:33:35] ppocr INFO: keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] [2022/10/21 22:33:35] ppocr INFO: loader : [2022/10/21 22:33:35] ppocr INFO: batch_size_per_card : 4 [2022/10/21 22:33:35] ppocr INFO: drop_last : False [2022/10/21 22:33:35] ppocr INFO: num_workers : 4 [2022/10/21 22:33:35] ppocr INFO: shuffle : True [2022/10/21 22:33:35] ppocr INFO: profiler_options : None [2022/10/21 22:33:35] ppocr INFO: train with paddle 2.3.1 and device Place(gpu:0) [2022/10/21 22:33:40] ppocr INFO: Initialize indexs of datasets:['./data/train.txt'] [2022/10/21 22:33:40] ppocr INFO: Initialize indexs of datasets:['./data/val.txt'] [2022/10/21 22:33:41] ppocr INFO: load pretrain successful from ./pretrain_models/en_PP-OCRv3_det_distill_train/best_accuracy

完整报错/Complete Error Message: ... [2022/10/22 11:26:01] ppocr INFO: load pretrain successful from ./pretrain_models/en_PP-OCRv3_det_distill_train/best_accuracy [2022/10/22 11:26:01] ppocr INFO: train dataloader has 3138 iters [2022/10/22 11:26:01] ppocr INFO: valid dataloader has 16735 iters [2022/10/22 11:26:01] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 400 iterations W1022 11:26:05.131489 47429 gpu_resources.cc:201] WARNING: device: . The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. Found inf or nan, current scale is: 1024.0, decrease to: 1024.00.5 Found inf or nan, current scale is: 512.0, decrease to: 512.00.5 Found inf or nan, current scale is: 256.0, decrease to: 256.00.5 Found inf or nan, current scale is: 128.0, decrease to: 128.00.5 Found inf or nan, current scale is: 64.0, decrease to: 64.00.5 [2022/10/22 11:26:24] ppocr INFO: epoch: [1/150], global_step: 10, lr: 0.000000, dila_dbloss_Student_Teacher: 5.963279, dila_dbloss_Student2_Teacher: 5.963279, loss: 25.867521, dml_thrink_maps_0: 0.000000, db_Student_loss_shrink_maps: 4.991880, db_Student_loss_threshold_maps: 0.968767, db_Student_loss_binary_maps: 0.998221, db_Student2_loss_shrink_maps: 4.991880, db_Student2_loss_threshold_maps: 0.968767, db_Student2_loss_binary_maps: 0.998221, avg_reader_cost: 0.37981 s, avg_batch_cost: 2.33164 s, avg_samples: 4.0, ips: 1.71553 samples/s, eta: 12 days, 16:51:21 Found inf or nan, current scale is: 32.0, decrease to: 32.00.5 Found inf or nan, current scale is: 16.0, decrease to: 16.0*0.5 ...

Seylena avatar Oct 24 '22 03:10 Seylena