PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

ch_PP-OCRv4_rec_distill.yml Train使用 SimpleDataSet 重复报错 KeyError: 'valid_ratio' 相关文件 ppocr\data\simple_dataset.py

Open rakiwine opened this issue 10 months ago • 4 comments

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment:

- 系统环境/System Environment:
- nvcc: NVIDIA (R) Cuda compiler driver
- Copyright (c) 2005-2022 NVIDIA Corporation
- Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
- Cuda compilation tools, release 11.8, V11.8.89
- Build cuda_11.8.r11.8/compiler.31833905_0
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 531.79                 Driver Version: 531.79       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                      TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3070 T...  WDDM | 00000000:01:00.0  On |                  N/A |
| N/A   67C    P8               19W /  N/A|   2283MiB /  8192MiB |     12%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

版本号/Version:

Windows 11 专业版 22621.1848

Paddle:

paddlepaddle-gpu 2.6.1.post117

PaddleOCR:

release/2.7 [6ed5562](https://github.com/PaddlePaddle/PaddleOCR/commit/6ed5562cedce18f0313d3b83271986dd6c136871)

问题相关组件/Related components:

ppocr\data\simple_dataset.py

运行指令/Command Code:

python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distill.yml -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv4_rec_train/student
Global:
  debug: false
  use_gpu: true
  epoch_num: 200
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec_dkd_400w_svtr_ctc_lcnet_blank_dkd0.1/
  save_epoch_step: 40
  eval_batch_step:
  - 0
  - 2000
  cal_metric_during_train: true
  pretrained_model: null
  checkpoints:
  save_inference_dir: null
  use_visualdl: true
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: ppocr/utils/ppocr_keys_v1.txt
  max_text_length: &max_text_length 20
  infer_mode: false
  use_space_char: false
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv3.txt


Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.0005
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05


Architecture:
  model_type: rec
  name: DistillationModel
  algorithm: Distillation
  Models:
    Teacher:
      pretrained:
      freeze_params: true
      return_all_feats: true
      model_type: rec
      algorithm: SVTR
      Transform: null
      Backbone:
        name: SVTRNet
        img_size:
        - 48
        - 320
        out_char_num: 40
        out_channels: 192
        patch_merging: Conv
        embed_dim:
        - 64
        - 128
        - 256
        depth:
        - 3
        - 6
        - 3
        num_heads:
        - 2
        - 4
        - 8
        mixer:
        - Conv
        - Conv
        - Conv
        - Conv
        - Conv
        - Conv
        - Global
        - Global
        - Global
        - Global
        - Global
        - Global
        local_mixer:
        - - 5
          - 5
        - - 5
          - 5
        - - 5
          - 5
        last_stage: false
        prenorm: true
      Head:
        name: MultiHead
        head_list:
          - CTCHead:
              Neck:
                name: svtr
                dims: 120
                depth: 2
                hidden_dims: 120
                kernel_size: [1, 3]
                use_guide: True
              Head:
                fc_decay: 0.00001
          - NRTRHead:
              nrtr_dim: 384
              max_text_length: *max_text_length
    Student:
      pretrained:
      freeze_params: false
      return_all_feats: true
      model_type: rec
      algorithm: SVTR
      Transform: null
      Backbone:
        name: PPLCNetV3
        scale: 0.95
      Head:
        name: MultiHead
        head_list:
          - CTCHead:
              Neck:
                name: svtr
                dims: 120
                depth: 2
                hidden_dims: 120
                kernel_size: [1, 3]
                use_guide: True
              Head:
                fc_decay: 0.00001
          - NRTRHead:
              nrtr_dim: 384
              max_text_length: *max_text_length

Loss:
  name: CombinedLoss
  loss_config_list:
  - DistillationDKDLoss:
      weight: 0.1
      model_name_pairs:
      - - Student
        - Teacher
      key: head_out
      multi_head: true
      alpha: 1.0
      beta: 2.0
      dis_head: gtc
      name: dkd
  - DistillationCTCLoss:
      weight: 1.0
      model_name_list:
      - Student
      key: head_out
      multi_head: true
  - DistillationNRTRLoss:
      weight: 1.0
      smoothing: false
      model_name_list:
      - Student
      key: head_out
      multi_head: true
  - DistillCTCLogits:
      weight: 1.0
      reduction: mean
      model_name_pairs:
      - - Student
        - Teacher
      key: head_out

PostProcess:
  name: DistillationCTCLabelDecode
  model_name:
  - Student
  key: head_out
  multi_head: true

Metric:
  name: DistillationMetric
  base_metric_name: RecMetric
  main_indicator: acc
  key: Student
  ignore_space: false

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/rec/train
    label_file_list:
    - ./train_data/rec/train.txt
    ratio_list:
    - 1.0
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecAug:
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  loader:
    shuffle: true
    batch_size_per_card: 16
    drop_last: true
    num_workers: 16
    use_shared_memory: true

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/rec/val
    label_file_list:
    - ./train_data/rec/val.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 16
    num_workers: 16

profiler_options: null

完整报错/Complete Error Message:

重复报错以下

Traceback (most recent call last):
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 159, in getitem
outs = transform(data, self.ops)
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\imaug_init_.py", line 56, in transform
data = op(data)
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\imaug\operators.py", line 133, in call
data_list.append(data[key])
KeyError: 'valid_ratio'

以及最后报错

Fatal Python error: Cannot recover from stack overflow.
Python runtime state: initialized

Current thread 0x0000f084 (most recent call first):
File "D:\Anaconda3\envs\PaddleOCR\lib\site-packages\numpy\core_methods.py", line 103 in _clip_dep_is_byte_swapped
File "D:\Anaconda3\envs\PaddleOCR\lib\site-packages\numpy\core_methods.py", line 133 in _clip
File "D:\Anaconda3\envs\PaddleOCR\lib\site-packages\numpy\core\fromnumeric.py", line 57 in wrapfunc
File "D:\Anaconda3\envs\PaddleOCR\lib\site-packages\numpy\core\fromnumeric.py", line 2154 in clip
File "<array_function internals>", line 180 in clip
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\imaug\text_image_aug\warp_mls.py", line 148 in gen_img
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\imaug\text_image_aug\warp_mls.py", line 42 in generate
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\imaug\text_image_aug\augment.py", line 60 in tia_distort
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\imaug\rec_img_aug.py", line 48 in call
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\imaug_init.py", line 56 in transform
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 159 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
File "D:\Work\hionNowWork\PaddleOCR\ppocr\data\simple_dataset.py", line 169 in getitem
...

Thread 0x000107b4 (most recent call first):
File "D:\Anaconda3\envs\PaddleOCR\lib\threading.py", line 306 in wait
File "D:\Anaconda3\envs\PaddleOCR\lib\queue.py", line 179 in get
File "D:\Anaconda3\envs\PaddleOCR\lib\site-packages\visualdl\writer\record_writer.py", line 206 in run
File "D:\Anaconda3\envs\PaddleOCR\lib\threading.py", line 932 in _bootstrap_inner
File "D:\Anaconda3\envs\PaddleOCR\lib\threading.py", line 890 in _bootstrap

Thread 0x000058b0 (most recent call first):
File "D:\Anaconda3\envs\PaddleOCR\lib\site-packages\paddle\io\dataloader\dataloader_iter.py", line 291 in next
File "D:\Work\hionNowWork\PaddleOCR\tools\program.py", line 272 in train
File "tools/train.py", line 200 in main
File "tools/train.py", line 229 in

推测

重复报错部分应该是因为Train dataset SimpleDataSet 中没有 valid_ratio 和 keep_keys 有 valid_ratio
Train:
dataset:
name: SimpleDataSet
...
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
...
ppocr\data\simple_dataset.py中的MultiScaleDataSet类的函数 resize_norm_img 包含 data['valid_ratio'] = valid_ratio
    def resize_norm_img(self, data, imgW, imgH, padding=True):
        img = data['image']
        h = img.shape[0]
        w = img.shape[1]
        if not padding:
            resized_image = cv2.resize(
                img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
            resized_w = imgW
        else:
            ratio = w / float(h)
            if math.ceil(imgH * ratio) > imgW:
                resized_w = imgW
            else:
                resized_w = int(math.ceil(imgH * ratio))
            resized_image = cv2.resize(img, (resized_w, imgH))
        resized_image = resized_image.astype('float32')

        resized_image = resized_image.transpose((2, 0, 1)) / 255
        resized_image -= 0.5
        resized_image /= 0.5
        padding_im = np.zeros((3, imgH, imgW), dtype=np.float32)
        padding_im[:, :, :resized_w] = resized_image
        valid_ratio = min(1.0, float(resized_w / imgW))
        data['image'] = padding_im
        data['valid_ratio'] = valid_ratio
        return data

rakiwine avatar Apr 19 '24 03:04 rakiwine

可以在ppocr/data/imaug/init.py文件下transform中检查data信息,查看是预处理的哪一步出现了问题,如果有新的问题,可以回复我

UserWangZz avatar Apr 19 '24 06:04 UserWangZz

我咋感觉是模型和ymal不匹配的关系啊

zhanghengjiayou avatar Apr 22 '24 07:04 zhanghengjiayou

请问是执行识别模型的训练吗?还是在做知识蒸馏的任务?

UserWangZz avatar Apr 23 '24 02:04 UserWangZz

请问怎样修改呢?我也是出现这样的问题

zhenhuamo avatar May 22 '24 02:05 zhenhuamo