PaddleSlim icon indicating copy to clipboard operation
PaddleSlim copied to clipboard

run `example/auto_compression/semantic_segmentation` get NaN

Open DrRyanHuang opened this issue 2 years ago • 1 comments

Hi, I run run.py as follow in PaddleSlim/example/auto_compression/semantic_segmentation:

args.config_path = "configs/pp_liteseg/pp_liteseg_mine.yaml"
args.save_dir = './save_sparse_model'

but I got:

2022-07-22 12:59:49,955-INFO: quant_aware config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['skip_quant'], 'quantize_op_types': ['conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': False, 'is_full_quantize': False, 'name': 'Distillation', 'loss': 'l2', 'node': [], 'alpha': 1.0, 'teacher_model_dir': '../../../../714/save/output', 'teacher_model_filename': 'model.pdmodel', 'teacher_params_filename': 'model.pdiparams'}
2022-07-22 13:00:36,223-INFO: Total iter: 0, epoch: 0, batch: 0, loss: [0.02632807]
2022-07-22 13:00:36,593-INFO: Total iter: 1, epoch: 0, batch: 1, loss: [0.0527272]
2022-07-22 13:00:36,777-INFO: Total iter: 2, epoch: 0, batch: 2, loss: [0.1315988]
2022-07-22 13:00:37,332-INFO: Total iter: 3, epoch: 0, batch: 3, loss: [0.3486606]
2022-07-22 13:00:37,476-INFO: Total iter: 4, epoch: 0, batch: 4, loss: [9.675454]
2022-07-22 13:00:37,607-INFO: Total iter: 5, epoch: 0, batch: 5, loss: [61.659126]
2022-07-22 13:00:37,731-INFO: Total iter: 6, epoch: 0, batch: 6, loss: [4.6202196e+27]
2022-07-22 13:00:37,859-INFO: Total iter: 7, epoch: 0, batch: 7, loss: [nan]
2022-07-22 13:00:37,979-INFO: Total iter: 8, epoch: 0, batch: 8, loss: [nan]
2022-07-22 13:00:38,101-INFO: Total iter: 9, epoch: 0, batch: 9, loss: [nan]
.....
2022-07-22 13:00:42,007-INFO: Total iter: 41, epoch: 0, batch: 41, loss: [nan]

configs/pp_liteseg/pp_liteseg_mine.yaml:

Global:
  reader_config: ../../../../714/save/config_paddleslim.yml
  model_dir: ../../../../714/save/output
  model_filename: model.pdmodel
  params_filename: model.pdiparams

TrainConfig:
 epochs: 14
 logging_iter: 1
 eval_iter: 90
 learning_rate: 
   type: PiecewiseDecay
   boundaries: [900]
   values: [0.001, 0.0005]
 optimizer_builder:
   optimizer: 
     type: SGD
   weight_decay: 0.0005  

config_paddleslim.yml:

train_dataset:
  type: Dataset
  # dataset_root: /root/share/program/save/data/portraint
  # val_path: /root/share/program/save/data/portraint/txtfiles/test.txt
  dataset_root: ../../../..
  train_path:  ../../../../test/test.txt
  num_classes: 2
  transforms:
    - type: Resize
      target_size: [224, 224]
    - type: Normalize
  mode: train

val_dataset:
  type: Dataset
  # dataset_root: /root/share/program/save/data/portraint
  # val_path: /root/share/program/save/data/portraint/txtfiles/test.txt
  dataset_root: ../../../..
  val_path: ../../../../test/test.txt
  num_classes: 2
  transforms:
    - type: Resize
      target_size: [224, 224]
    - type: Normalize
  mode: val

export:
  transforms:
    - type: Resize
      target_size: [224, 224]
    - type: Normalize

# optimizer:
#   type: sgd
#   momentum: 0.9
#   weight_decay: 5.0e-4

# lr_scheduler:
#   type: PolynomialDecay
#   learning_rate: 0.02  #finetune 0.02
#   end_lr: 0
#   power: 0.9
#   warmup_iters: 100
#   warmup_start_lr: 1.0e-5

# loss:
#   types:
#     - type: CrossEntropyLoss
#       #min_kept: 1254400   # batch_size * 224 * 224 // 16
#     - type: CrossEntropyLoss
#       #min_kept: 1254400
#     - type: CrossEntropyLoss
#       #min_kept: 1254400
#     - type: CrossEntropyLoss
#   coef: [1, 1, 1, 1]

model:
  type: PPLiteSeg
  backbone:
    type: STDC2
  backbone_indices: [0, 1, 2, 3]
  arm_out_chs: [16, 32, 32, 64]
  seg_head_inter_chs: [8, 16, 16, 32]
  # pretrained: output/pp_liteseg_stdc2_myhumanseg_modify_1_syntheses/iter_4200/model.pdparams

Could you please tell me the possible reason? Thanks!!

DrRyanHuang avatar Jul 22 '22 05:07 DrRyanHuang

Please try to set the distillation node explicitly, or replace the distillation loss and try again. For details on how to set up the distillation node, please refer to this document. https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/hyperparameter_tutorial.md#1%E8%87%AA%E5%8A%A8%E8%92%B8%E9%A6%8F%E6%95%88%E6%9E%9C%E4%B8%8D%E7%90%86%E6%83%B3%E6%80%8E%E4%B9%88%E8%87%AA%E4%B8%BB%E9%80%89%E6%8B%A9%E8%92%B8%E9%A6%8F%E8%8A%82%E7%82%B9

zzjjay avatar Jul 26 '22 06:07 zzjjay