PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

新人求解,训练模型时,在打印完日志‘During the training process, after the 0th iteration, an evaluation is run every 400 iterations’后,卡住了

Open Lijian500 opened this issue 1 year ago • 3 comments

问题: 训练模型时,在打印完日志‘During the training process, after the 0th iteration, an evaluation is run every 400 iterations’后,没有动静了(超过半个小时没有任何其他日志输出),需要如何排查问题。

  • 系统环境/System Environment:windons 11
  • 版本号/Version:Paddle:2.6.1 PaddleOCR:2.7 问题相关组件/Related components:
  • 运行指令/Command Code:python tools/train.py -c pretrain_models/ch_PP-OCRv3_det_cml.yml
  • 完整报错/Complete Error Message:

[2024/05/09 14:55:16] ppocr INFO: Architecture : [2024/05/09 14:55:16] ppocr INFO: Models : [2024/05/09 14:55:16] ppocr INFO: Student : [2024/05/09 14:55:16] ppocr INFO: Backbone : [2024/05/09 14:55:16] ppocr INFO: disable_se : True [2024/05/09 14:55:16] ppocr INFO: model_name : large [2024/05/09 14:55:16] ppocr INFO: name : MobileNetV3 [2024/05/09 14:55:16] ppocr INFO: scale : 0.5 [2024/05/09 14:55:16] ppocr INFO: Head : [2024/05/09 14:55:16] ppocr INFO: k : 50 [2024/05/09 14:55:16] ppocr INFO: name : DBHead [2024/05/09 14:55:16] ppocr INFO: Neck : [2024/05/09 14:55:16] ppocr INFO: name : RSEFPN [2024/05/09 14:55:16] ppocr INFO: out_channels : 96 [2024/05/09 14:55:16] ppocr INFO: shortcut : True [2024/05/09 14:55:16] ppocr INFO: Transform : None [2024/05/09 14:55:16] ppocr INFO: algorithm : DB [2024/05/09 14:55:16] ppocr INFO: model_type : det [2024/05/09 14:55:16] ppocr INFO: pretrained : None [2024/05/09 14:55:16] ppocr INFO: Student2 : [2024/05/09 14:55:16] ppocr INFO: Backbone : [2024/05/09 14:55:16] ppocr INFO: disable_se : True [2024/05/09 14:55:16] ppocr INFO: model_name : large [2024/05/09 14:55:16] ppocr INFO: name : MobileNetV3 [2024/05/09 14:55:16] ppocr INFO: scale : 0.5 [2024/05/09 14:55:16] ppocr INFO: Head : [2024/05/09 14:55:16] ppocr INFO: k : 50 [2024/05/09 14:55:16] ppocr INFO: name : DBHead [2024/05/09 14:55:16] ppocr INFO: Neck : [2024/05/09 14:55:16] ppocr INFO: name : RSEFPN [2024/05/09 14:55:16] ppocr INFO: out_channels : 96 [2024/05/09 14:55:16] ppocr INFO: shortcut : True [2024/05/09 14:55:16] ppocr INFO: Transform : None [2024/05/09 14:55:16] ppocr INFO: algorithm : DB [2024/05/09 14:55:16] ppocr INFO: model_type : det [2024/05/09 14:55:16] ppocr INFO: pretrained : None [2024/05/09 14:55:16] ppocr INFO: Teacher : [2024/05/09 14:55:16] ppocr INFO: Backbone : [2024/05/09 14:55:16] ppocr INFO: in_channels : 3 [2024/05/09 14:55:16] ppocr INFO: layers : 50 [2024/05/09 14:55:16] ppocr INFO: name : ResNet_vd [2024/05/09 14:55:16] ppocr INFO: Head : [2024/05/09 14:55:16] ppocr INFO: k : 50 [2024/05/09 14:55:16] ppocr INFO: kernel_list : [7, 2, 2] [2024/05/09 14:55:16] ppocr INFO: name : DBHead [2024/05/09 14:55:16] ppocr INFO: Neck : [2024/05/09 14:55:16] ppocr INFO: name : LKPAN [2024/05/09 14:55:16] ppocr INFO: out_channels : 256 [2024/05/09 14:55:16] ppocr INFO: algorithm : DB [2024/05/09 14:55:16] ppocr INFO: freeze_params : True [2024/05/09 14:55:16] ppocr INFO: model_type : det [2024/05/09 14:55:16] ppocr INFO: return_all_feats : False [2024/05/09 14:55:16] ppocr INFO: algorithm : Distillation [2024/05/09 14:55:16] ppocr INFO: model_type : det [2024/05/09 14:55:16] ppocr INFO: name : DistillationModel [2024/05/09 14:55:16] ppocr INFO: Eval : [2024/05/09 14:55:16] ppocr INFO: dataset : [2024/05/09 14:55:16] ppocr INFO: data_dir : ./train_data/ [2024/05/09 14:55:16] ppocr INFO: label_file_list : ['./train_data/det/val.txt'] [2024/05/09 14:55:16] ppocr INFO: name : SimpleDataSet [2024/05/09 14:55:16] ppocr INFO: transforms : [2024/05/09 14:55:16] ppocr INFO: DecodeImage : [2024/05/09 14:55:16] ppocr INFO: channel_first : False [2024/05/09 14:55:16] ppocr INFO: img_mode : BGR [2024/05/09 14:55:16] ppocr INFO: DetLabelEncode : None [2024/05/09 14:55:16] ppocr INFO: DetResizeForTest : None [2024/05/09 14:55:16] ppocr INFO: NormalizeImage : [2024/05/09 14:55:16] ppocr INFO: mean : [0.485, 0.456, 0.406] [2024/05/09 14:55:16] ppocr INFO: order : hwc [2024/05/09 14:55:16] ppocr INFO: scale : 1./255. [2024/05/09 14:55:16] ppocr INFO: std : [0.229, 0.224, 0.225] [2024/05/09 14:55:16] ppocr INFO: ToCHWImage : None [2024/05/09 14:55:16] ppocr INFO: KeepKeys : [2024/05/09 14:55:16] ppocr INFO: keep_keys : ['image', 'shape', 'polys', 'ignore_tags'] [2024/05/09 14:55:16] ppocr INFO: loader : [2024/05/09 14:55:16] ppocr INFO: batch_size_per_card : 1 [2024/05/09 14:55:16] ppocr INFO: drop_last : False [2024/05/09 14:55:16] ppocr INFO: num_workers : 2 [2024/05/09 14:55:16] ppocr INFO: shuffle : False [2024/05/09 14:55:16] ppocr INFO: Global : [2024/05/09 14:55:16] ppocr INFO: amp_dtype : bfloat16 [2024/05/09 14:55:16] ppocr INFO: cal_metric_during_train : False [2024/05/09 14:55:16] ppocr INFO: checkpoints : None [2024/05/09 14:55:16] ppocr INFO: d2s_train_image_shape : [3, -1, -1] [2024/05/09 14:55:16] ppocr INFO: debug : False [2024/05/09 14:55:16] ppocr INFO: distributed : False [2024/05/09 14:55:16] ppocr INFO: epoch_num : 500 [2024/05/09 14:55:16] ppocr INFO: eval_batch_step : [0, 400] [2024/05/09 14:55:16] ppocr INFO: infer_img : doc/imgs_en/img_10.jpg [2024/05/09 14:55:16] ppocr INFO: log_smooth_window : 20 [2024/05/09 14:55:16] ppocr INFO: pretrained_model : ./pretrain_models/ch_PP-OCRv3_det_distill_train/ch_PP-OCRv3_det_distill_train/best_accuracy [2024/05/09 14:55:16] ppocr INFO: print_batch_step : 10 [2024/05/09 14:55:16] ppocr INFO: save_epoch_step : 100 [2024/05/09 14:55:16] ppocr INFO: save_inference_dir : None [2024/05/09 14:55:16] ppocr INFO: save_model_dir : ./output/ch_PP-OCR_v3_det/ [2024/05/09 14:55:16] ppocr INFO: save_res_path : ./checkpoints/det_db/predicts_db.txt [2024/05/09 14:55:16] ppocr INFO: use_gpu : False [2024/05/09 14:55:16] ppocr INFO: use_visualdl : False [2024/05/09 14:55:16] ppocr INFO: Loss : [2024/05/09 14:55:16] ppocr INFO: loss_config_list : [2024/05/09 14:55:16] ppocr INFO: DistillationDilaDBLoss : [2024/05/09 14:55:16] ppocr INFO: alpha : 5 [2024/05/09 14:55:16] ppocr INFO: balance_loss : True [2024/05/09 14:55:16] ppocr INFO: beta : 10 [2024/05/09 14:55:16] ppocr INFO: key : maps [2024/05/09 14:55:16] ppocr INFO: main_loss_type : DiceLoss [2024/05/09 14:55:16] ppocr INFO: model_name_pairs : [['Student', 'Teacher'], ['Student2', 'Teacher']] [2024/05/09 14:55:16] ppocr INFO: ohem_ratio : 3 [2024/05/09 14:55:16] ppocr INFO: weight : 1.0 [2024/05/09 14:55:16] ppocr INFO: DistillationDMLLoss : [2024/05/09 14:55:16] ppocr INFO: key : maps [2024/05/09 14:55:16] ppocr INFO: maps_name : thrink_maps [2024/05/09 14:55:16] ppocr INFO: model_name_pairs : ['Student', 'Student2'] [2024/05/09 14:55:16] ppocr INFO: weight : 1.0 [2024/05/09 14:55:16] ppocr INFO: DistillationDBLoss : [2024/05/09 14:55:16] ppocr INFO: alpha : 5 [2024/05/09 14:55:16] ppocr INFO: balance_loss : True [2024/05/09 14:55:16] ppocr INFO: beta : 10 [2024/05/09 14:55:16] ppocr INFO: main_loss_type : DiceLoss [2024/05/09 14:55:16] ppocr INFO: model_name_list : ['Student', 'Student2'] [2024/05/09 14:55:16] ppocr INFO: ohem_ratio : 3 [2024/05/09 14:55:16] ppocr INFO: weight : 1.0 [2024/05/09 14:55:16] ppocr INFO: name : CombinedLoss [2024/05/09 14:55:16] ppocr INFO: Metric : [2024/05/09 14:55:16] ppocr INFO: base_metric_name : DetMetric [2024/05/09 14:55:16] ppocr INFO: key : Student [2024/05/09 14:55:16] ppocr INFO: main_indicator : hmean [2024/05/09 14:55:16] ppocr INFO: name : DistillationMetric [2024/05/09 14:55:16] ppocr INFO: Optimizer : [2024/05/09 14:55:16] ppocr INFO: beta1 : 0.9 [2024/05/09 14:55:16] ppocr INFO: beta2 : 0.999 [2024/05/09 14:55:16] ppocr INFO: lr : [2024/05/09 14:55:16] ppocr INFO: learning_rate : 0.001 [2024/05/09 14:55:16] ppocr INFO: name : Cosine [2024/05/09 14:55:16] ppocr INFO: warmup_epoch : 2 [2024/05/09 14:55:16] ppocr INFO: name : Adam [2024/05/09 14:55:16] ppocr INFO: regularizer : [2024/05/09 14:55:16] ppocr INFO: factor : 5e-05 [2024/05/09 14:55:16] ppocr INFO: name : L2 [2024/05/09 14:55:16] ppocr INFO: PostProcess : [2024/05/09 14:55:16] ppocr INFO: box_thresh : 0.6 [2024/05/09 14:55:16] ppocr INFO: key : head_out [2024/05/09 14:55:16] ppocr INFO: max_candidates : 1000 [2024/05/09 14:55:16] ppocr INFO: model_name : ['Student'] [2024/05/09 14:55:16] ppocr INFO: name : DistillationDBPostProcess [2024/05/09 14:55:16] ppocr INFO: thresh : 0.3 [2024/05/09 14:55:16] ppocr INFO: unclip_ratio : 1.5 [2024/05/09 14:55:16] ppocr INFO: Train : [2024/05/09 14:55:16] ppocr INFO: dataset : [2024/05/09 14:55:16] ppocr INFO: data_dir : ./train_data/ [2024/05/09 14:55:16] ppocr INFO: label_file_list : ['./train_data/det/train.txt'] [2024/05/09 14:55:16] ppocr INFO: name : SimpleDataSet [2024/05/09 14:55:16] ppocr INFO: ratio_list : [1.0] [2024/05/09 14:55:16] ppocr INFO: transforms : [2024/05/09 14:55:16] ppocr INFO: DecodeImage : [2024/05/09 14:55:16] ppocr INFO: channel_first : False [2024/05/09 14:55:16] ppocr INFO: img_mode : BGR [2024/05/09 14:55:16] ppocr INFO: DetLabelEncode : None [2024/05/09 14:55:16] ppocr INFO: CopyPaste : None [2024/05/09 14:55:16] ppocr INFO: IaaAugment : [2024/05/09 14:55:16] ppocr INFO: augmenter_args : [2024/05/09 14:55:16] ppocr INFO: args : [2024/05/09 14:55:16] ppocr INFO: p : 0.5 [2024/05/09 14:55:16] ppocr INFO: type : Fliplr [2024/05/09 14:55:16] ppocr INFO: args : [2024/05/09 14:55:16] ppocr INFO: rotate : [-10, 10] [2024/05/09 14:55:16] ppocr INFO: type : Affine [2024/05/09 14:55:16] ppocr INFO: args : [2024/05/09 14:55:16] ppocr INFO: size : [0.5, 3] [2024/05/09 14:55:16] ppocr INFO: type : Resize [2024/05/09 14:55:16] ppocr INFO: EastRandomCropData : [2024/05/09 14:55:16] ppocr INFO: keep_ratio : True [2024/05/09 14:55:16] ppocr INFO: max_tries : 50 [2024/05/09 14:55:16] ppocr INFO: size : [960, 960] [2024/05/09 14:55:16] ppocr INFO: MakeBorderMap : [2024/05/09 14:55:16] ppocr INFO: shrink_ratio : 0.4 [2024/05/09 14:55:16] ppocr INFO: thresh_max : 0.7 [2024/05/09 14:55:16] ppocr INFO: thresh_min : 0.3 [2024/05/09 14:55:16] ppocr INFO: MakeShrinkMap : [2024/05/09 14:55:16] ppocr INFO: min_text_size : 8 [2024/05/09 14:55:16] ppocr INFO: shrink_ratio : 0.4 [2024/05/09 14:55:16] ppocr INFO: NormalizeImage : [2024/05/09 14:55:16] ppocr INFO: mean : [0.485, 0.456, 0.406] [2024/05/09 14:55:16] ppocr INFO: order : hwc [2024/05/09 14:55:16] ppocr INFO: scale : 1./255. [2024/05/09 14:55:16] ppocr INFO: std : [0.229, 0.224, 0.225] [2024/05/09 14:55:16] ppocr INFO: ToCHWImage : None [2024/05/09 14:55:16] ppocr INFO: KeepKeys : [2024/05/09 14:55:16] ppocr INFO: keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] [2024/05/09 14:55:16] ppocr INFO: loader : [2024/05/09 14:55:16] ppocr INFO: batch_size_per_card : 8 [2024/05/09 14:55:16] ppocr INFO: drop_last : False [2024/05/09 14:55:16] ppocr INFO: num_workers : 4 [2024/05/09 14:55:16] ppocr INFO: shuffle : True [2024/05/09 14:55:16] ppocr INFO: profiler_options : None [2024/05/09 14:55:16] ppocr INFO: train with paddle 2.6.1 and device Place(cpu) [2024/05/09 14:55:16] ppocr INFO: Initialize indexs of datasets:['./train_data/det/train.txt'] [2024/05/09 14:55:16] ppocr INFO: Initialize indexs of datasets:['./train_data/det/val.txt'] [2024/05/09 14:55:18] ppocr INFO: train dataloader has 3 iters [2024/05/09 14:55:18] ppocr INFO: valid dataloader has 6 iters [2024/05/09 14:55:18] ppocr INFO: load pretrain successful from ./pretrain_models/ch_PP-OCRv3_det_distill_train/ch_PP-OCRv3_det_distill_train/best_accuracy [2024/05/09 14:55:18] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 400 iterations

Lijian500 avatar May 09 '24 07:05 Lijian500

你好,请问使用的是cpu版本的paddle还是gpu版本的paddle

UserWangZz avatar May 09 '24 12:05 UserWangZz

你好,请问使用的是cpu版本的paddle还是gpu版本的paddle

你好,是cpu版本。

后面我强制关闭了训练程序,但在outinput目录下查看时,发现生成了对应的模型文件,也许是我当时数据量太小(30张图片),瞬间就完成了,所以没有产生日志?

Lijian500 avatar May 11 '24 08:05 Lijian500

batchsize=8,30张图,一个epoch4个iteration,log_smooth_window : 20,所以应该5个epoch才会打印出一个log。可能与这个有关系

UserWangZz avatar May 11 '24 09:05 UserWangZz