PaddleSlim
PaddleSlim copied to clipboard
使用自动压缩PPYOLOE,得到的模型大小和推理时间基本没有变化
使用方法参考官方示例文档, 通过指令导出模型:
python tools/export_model.py \
-c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml \
-o weights=~/ss/code/PaddleYOLO/output/ppyoloe_plus_crn_s_80e_coco_shrimp/best_model.pdparams \
trt=True exclude_nms=True
通过指令训练模型:
CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch --log_dir=log --gpus 0,1 run.py \
--config_path=./configs/ppyoloe_x_qat_dis.yaml --save_dir='./output/'
模型训练时的部分log如下:
2023-03-08 10:48:47,645-INFO: Total iter: 4900, epoch: 0, batch: 4900, loss: [11.598398]soft_label: [11.598398]
2023-03-08 10:48:48,808-INFO: Total iter: 4910, epoch: 0, batch: 4910, loss: [11.745678]soft_label: [11.745678]
2023-03-08 10:48:49,972-INFO: Total iter: 4920, epoch: 0, batch: 4920, loss: [11.701544]soft_label: [11.701544]
2023-03-08 10:48:51,137-INFO: Total iter: 4930, epoch: 0, batch: 4930, loss: [11.786173]soft_label: [11.786173]
2023-03-08 10:48:52,301-INFO: Total iter: 4940, epoch: 0, batch: 4940, loss: [11.767839]soft_label: [11.767839]
2023-03-08 10:48:53,466-INFO: Total iter: 4950, epoch: 0, batch: 4950, loss: [11.527636]soft_label: [11.527636]
2023-03-08 10:48:54,631-INFO: Total iter: 4960, epoch: 0, batch: 4960, loss: [11.843047]soft_label: [11.843047]
2023-03-08 10:48:55,798-INFO: Total iter: 4970, epoch: 0, batch: 4970, loss: [10.901478]soft_label: [10.901478]
2023-03-08 10:48:56,963-INFO: Total iter: 4980, epoch: 0, batch: 4980, loss: [11.668227]soft_label: [11.668227]
2023-03-08 10:48:58,124-INFO: Total iter: 4990, epoch: 0, batch: 4990, loss: [11.696041]soft_label: [11.696041]
Eval iter: 0
...
Eval iter: 2000
[03/08 10:51:32] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
[03/08 10:51:32] ppdet.metrics.coco_utils INFO: Start evaluate...
Loading and preparing results...
DONE (t=1.67s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=14.18s).
Accumulating evaluation results...
DONE (t=4.24s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.741
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.968
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.874
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.676
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.465
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.801
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.805
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.597
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.799
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.711
2023-03-08 10:51:53,520-INFO: epoch: 0 metric of compressed model is: 0.740954, best metric of compressed model is 0.740954
2023-03-08 10:51:53,590-INFO: convert config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['skip_quant'], 'quantize_op_types': ['mul', 'conv2d', 'pool2d', 'depthwise_conv2d', 'elementwise_add', 'leaky_relu'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': True, 'is_full_quantize': True, 'onnx_format': False, 'quant_post_first': False, 'scale_trainable': True, 'name': 'Distillation', 'loss': 'soft_label', 'node': [], 'alpha': 1.0, 'teacher_model_dir': './shrimp_baseline_export_model/ppyoloe_plus_crn_s_80e_1024_512_coco_shrimp_whole', 'teacher_model_filename': 'model.pdmodel', 'teacher_params_filename': 'model.pdiparams'}
2023-03-08 10:51:59,765-INFO: ==> The metric of final model is 0.7410
2023-03-08 10:51:59,765-INFO: ==> The ACT compression has been completed and the final model is saved in `./auto_compression_model_res_3_8_for_trt_full_quantize/`
配置文件内容如下:
Global:
reader_config: configs/shrimp_reader.yml
exclude_nms: True
arch: PPYOLOE # When export exclude_nms=True, need set arch: PPYOLOE
Evaluation: True
model_dir: ./shrimp_baseline_export_model/ppyoloe_plus_crn_s_80e_1024_512_coco_shrimp_whole
model_filename: model.pdmodel
params_filename: model.pdiparams
Distillation:
alpha: 1.0
loss: soft_label
QuantAware:
for_tensorrt: true
is_full_quantize: true
onnx_format: false
use_pact: true
activation_quantize_type: 'moving_average_abs_max'
quantize_op_types:
- conv2d
- depthwise_conv2d
TrainConfig:
train_iter: 5000
eval_iter: 1000
learning_rate:
type: CosineAnnealingDecay
learning_rate: 0.00003
T_max: 6000
optimizer_builder:
optimizer:
type: SGD
weight_decay: 4.0e-05
模型自动压缩前【即export导出之后】的大小为28.65M,自动压缩之后模型大小为28.70M,TRT_FP32和TRT_FP16推理时间也几乎一致,请问是什么原因导致的?
另外通过全量化方法得到的模型大小有所减少28.65M->7.48M,但是模型推理速度仍然与量化之前几乎一致。
我用自动压缩的方法,模型大小减少了,但是推理速度一样几乎没有变化
请问下是使用的什么部署后端进行推理速度的测试?