PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

已经训练过的模型可以更改其learning_rate的方法继续训练吗? 以及,各种lr方法的示例可以如training.md第39行中那样给新手做一个说明书吗?不知道如OneCycle等lr方法具体怎么在yml文件中配置。

Open goodmight opened this issue 2 years ago • 1 comments

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:win10

  • 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components:

  1. ppocr/optimizer/learning_rate.py(对应第一问)
  2. doc/doc_ch/training.md 第39行:(对应第二问)
Optimizer:
  ...
  lr:
    name: Piecewise
    decay_epochs : [700, 800]
    values : [0.001, 0.0001]
    warmup_epoch: 5
  • 运行指令/Command Code: python tools/train.py -c configs/det/det_r34_vd_db_artfont.yml

  • 完整报错/Complete Error Message:

Traceback (most recent call last):
  File "tools/train.py", line 202, in <module>
    main(config, device, logger, vdl_writer)
  File "tools/train.py", line 169, in main
    pre_best_model_dict = load_model(config, model, optimizer,
  File "C:\Users\admin\Downloads\PPOCR\projects\PaddleOCR\ppocr\utils\save_load.py", line 124, in load_model
    optimizer.set_state_dict(optim_dict)
  File "C:\Users\admin\anaconda3\envs\paddle_env\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "C:\Users\admin\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "C:\Users\admin\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\framework.py", line 434, in __impl__
    return func(*args, **kwargs)
  File "C:\Users\admin\anaconda3\envs\paddle_env\lib\site-packages\paddle\optimizer\optimizer.py", line 302, in set_state_dict
    self._learning_rate.set_dict(state_dict["LR_Scheduler"])
KeyError: 'LR_Scheduler'
  • configs/det/det_r34_vd_db_artfont.yml:
Global:
  use_gpu: true
  epoch_num: 1200
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/det_r34_vd/
  save_epoch_step: 1200
  eval_batch_step: [0,2000]
  cal_metric_during_train: False
  checkpoints: ./output/det_r34_vd/latest
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./output/det_db/predicts_db.txt


Architecture:
  model_type: det
  algorithm: DB
  Transform:
  Backbone:
    name: ResNet_vd
    layers: 34
  Neck:
    name: DBFPN
    out_channels: 256
  Head:
    name: DBHead
    k: 50

Loss:
  name: DBLoss
  balance_loss: true
  main_loss_type: DiceLoss
  alpha: 5
  beta: 10
  ohem_ratio: 3

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
   #learning_rate: 0.001 # 之前的训练是用常数lr,只改常数值继续 训练是可以的。
   name: OneCycle # 训练到一半想改用OneCycle试试。就报错了。不清楚是训练以后不可以改lr方法,还是我调用这个lr方法的配置有问题。
   max_lr: 0.05
  regularizer:
    name: 'L2'
    factor: 0

PostProcess:
  name: DBPostProcess
  thresh: 0.3
  box_thresh: 0.7
  max_candidates: 1000
  unclip_ratio: 1.5

Metric:
  name: DetMetric
  main_indicator: hmean

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/artfont
    label_file_list:
      - ./train_data/artfont/training.txt
    ratio_list: [1.0]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - IaaAugment:
          augmenter_args:
            - { 'type': Fliplr, 'args': { 'p': 0.5 } }
            - { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
            - { 'type': Resize, 'args': { 'size': [0.5, 3] } }
      - EastRandomCropData:
          size: [640, 640]
          max_tries: 50
          keep_ratio: true
      - MakeBorderMap:
          shrink_ratio: 0.4
          thresh_min: 0.3
          thresh_max: 0.7
      - MakeShrinkMap:
          shrink_ratio: 0.4
          min_text_size: 8
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
  loader:
    shuffle: True
    drop_last: False
    batch_size_per_card: 16
    num_workers: 4

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/artfont/
    label_file_list:
      - ./train_data/artfont/eval.txt
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - DetResizeForTest:
          image_shape: [736, 1280]
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 1 # must be 1
    num_workers: 8

goodmight avatar Sep 23 '22 03:09 goodmight

  1. 确实,如果重新建立模型。就可以训练了。
  2. 似乎直接根据ppocr/optimizer/learning_rate.py文件中lr函数的参数在yml中给定就可以了。只是不清楚,应该给什么典型数值或者tuple、list

goodmight avatar Sep 23 '22 07:09 goodmight

閲讀 ppocr/optimizer/init.py可以瞭解:


def build_lr_scheduler(lr_config, epochs, step_each_epoch):
    from . import learning_rate
    lr_config.update({'epochs': epochs, 'step_each_epoch': step_each_epoch})
    lr_name = lr_config.pop('name', 'Const')
    lr = getattr(learning_rate, lr_name)(**lr_config)()
    return lr

goodmight avatar Sep 27 '22 05:09 goodmight