PaddleSlim 使用自动压缩PPYOLOE，得到的模型大小和推理时间基本没有变化

使用方法参考官方示例文档，通过指令导出模型：

python tools/export_model.py \
-c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml \
-o weights=~/ss/code/PaddleYOLO/output/ppyoloe_plus_crn_s_80e_coco_shrimp/best_model.pdparams \
 trt=True exclude_nms=True

通过指令训练模型：

CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch --log_dir=log --gpus 0,1 run.py \
          --config_path=./configs/ppyoloe_x_qat_dis.yaml --save_dir='./output/'

模型训练时的部分log如下：

2023-03-08 10:48:47,645-INFO: Total iter: 4900, epoch: 0, batch: 4900, loss: [11.598398]soft_label: [11.598398] 
2023-03-08 10:48:48,808-INFO: Total iter: 4910, epoch: 0, batch: 4910, loss: [11.745678]soft_label: [11.745678] 
2023-03-08 10:48:49,972-INFO: Total iter: 4920, epoch: 0, batch: 4920, loss: [11.701544]soft_label: [11.701544] 
2023-03-08 10:48:51,137-INFO: Total iter: 4930, epoch: 0, batch: 4930, loss: [11.786173]soft_label: [11.786173] 
2023-03-08 10:48:52,301-INFO: Total iter: 4940, epoch: 0, batch: 4940, loss: [11.767839]soft_label: [11.767839] 
2023-03-08 10:48:53,466-INFO: Total iter: 4950, epoch: 0, batch: 4950, loss: [11.527636]soft_label: [11.527636] 
2023-03-08 10:48:54,631-INFO: Total iter: 4960, epoch: 0, batch: 4960, loss: [11.843047]soft_label: [11.843047] 
2023-03-08 10:48:55,798-INFO: Total iter: 4970, epoch: 0, batch: 4970, loss: [10.901478]soft_label: [10.901478] 
2023-03-08 10:48:56,963-INFO: Total iter: 4980, epoch: 0, batch: 4980, loss: [11.668227]soft_label: [11.668227] 
2023-03-08 10:48:58,124-INFO: Total iter: 4990, epoch: 0, batch: 4990, loss: [11.696041]soft_label: [11.696041] 
Eval iter: 0
...
Eval iter: 2000
[03/08 10:51:32] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
[03/08 10:51:32] ppdet.metrics.coco_utils INFO: Start evaluate...
Loading and preparing results...
DONE (t=1.67s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=14.18s).
Accumulating evaluation results...
DONE (t=4.24s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.741
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.968
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.874
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.426
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.676
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.465
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.801
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.805
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.597
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.799
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.711
2023-03-08 10:51:53,520-INFO: epoch: 0 metric of compressed model is: 0.740954, best metric of compressed model is 0.740954
2023-03-08 10:51:53,590-INFO: convert config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['skip_quant'], 'quantize_op_types': ['mul', 'conv2d', 'pool2d', 'depthwise_conv2d', 'elementwise_add', 'leaky_relu'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': True, 'is_full_quantize': True, 'onnx_format': False, 'quant_post_first': False, 'scale_trainable': True, 'name': 'Distillation', 'loss': 'soft_label', 'node': [], 'alpha': 1.0, 'teacher_model_dir': './shrimp_baseline_export_model/ppyoloe_plus_crn_s_80e_1024_512_coco_shrimp_whole', 'teacher_model_filename': 'model.pdmodel', 'teacher_params_filename': 'model.pdiparams'}
2023-03-08 10:51:59,765-INFO: ==> The metric of final model is 0.7410
2023-03-08 10:51:59,765-INFO: ==> The ACT compression has been completed and the final model is saved in `./auto_compression_model_res_3_8_for_trt_full_quantize/`

配置文件内容如下：

Global:
  reader_config: configs/shrimp_reader.yml
  exclude_nms: True
  arch: PPYOLOE    # When export exclude_nms=True, need set arch: PPYOLOE
  Evaluation: True
  model_dir: ./shrimp_baseline_export_model/ppyoloe_plus_crn_s_80e_1024_512_coco_shrimp_whole
  model_filename: model.pdmodel
  params_filename: model.pdiparams


Distillation:
  alpha: 1.0
  loss: soft_label

QuantAware:
  for_tensorrt: true
  is_full_quantize: true
  onnx_format: false
  use_pact: true
  activation_quantize_type: 'moving_average_abs_max'
  quantize_op_types:
  - conv2d
  - depthwise_conv2d

TrainConfig:
  train_iter: 5000
  eval_iter: 1000
  learning_rate:  
    type: CosineAnnealingDecay
    learning_rate: 0.00003
    T_max: 6000
  optimizer_builder:
    optimizer: 
      type: SGD
    weight_decay: 4.0e-05

模型自动压缩前【即export导出之后】的大小为28.65M，自动压缩之后模型大小为28.70M，TRT_FP32和TRT_FP16推理时间也几乎一致，请问是什么原因导致的？

另外通过全量化方法得到的模型大小有所减少28.65M->7.48M，但是模型推理速度仍然与量化之前几乎一致。

Mar 08 '23 05:03 Nirvana93

我用自动压缩的方法，模型大小减少了，但是推理速度一样几乎没有变化

Apr 17 '23 00:04 mrljwlm

请问下是使用的什么部署后端进行推理速度的测试？

Feb 06 '24 06:02 ceci3

PaddleSlim PaddleSlim copied to clipboard

使用自动压缩PPYOLOE，得到的模型大小和推理时间基本没有变化

PaddleSlim
PaddleSlim copied to clipboard