FastDeploy
FastDeploy copied to clipboard
[Backend] support ipu in paddle inference backend.
PR types(PR类型)
Backend
Describe
增加paddle inference backend对IPU的支持。
测试结果:
example测试:
测试覆盖fastdeploy readme 中提供的除inceptionV3以外所有模型,用一张图片测试推理。
- 测试脚本:
import os
import re
import subprocess
model_list = {
"PPLCNet_x1_0":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNet_x1_0_infer.tgz",
"PPLCNetV2_base":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNetV2_base_infer.tgz",
"EfficientNetB7":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB7_infer.tgz",
"EfficientNetB0_small":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB0_small_infer.tgz",
"GhostNet_x1_3_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x1_3_ssld_infer.tgz",
"GhostNet_x0_5_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x0_5_infer.tgz",
"MobileNetV1_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_x0_25_infer.tgz",
"MobileNetV1_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_ssld_infer.tgz",
"MobileNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_x0_25_infer.tgz",
"MobileNetV2_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_ssld_infer.tgz",
"MobileNetV3_small_x0_35_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_small_x0_35_ssld_infer.tgz",
"MobileNetV3_large_x1_0_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_large_x1_0_ssld_infer.tgz",
"ShuffleNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x0_25_infer.tgz",
"ShuffleNetV2_x2_0":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x2_0_infer.tgz",
"SqueezeNet1_1":"https://bj.bcebos.com/paddlehub/fastdeploy/SqueezeNet1_1_infer.tgz",
"PPHGNet_tiny_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_tiny_ssld_infer.tgz",
"PPHGNet_base_ssld": "https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_base_ssld_infer.tgz",
"ResNet50_vd": "https://bj.bcebos.com/paddlehub/fastdeploy/ResNet50_vd_infer.tgz",
}
for k, v in model_list.items():
print("TESTING: {}".format(k))
pattern = r'.*\/([\d\w_]+).tgz$'
model_file = re.match(pattern, v).group(1)
download_cmd = f'''
wget {v}
tar -xvf {model_file}.tgz
'''
cpu_cmd = f'''
python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device cpu --topk 1
'''
ipu_cmd = f'''
python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device ipu --topk 1
'''
print(subprocess.Popen(download_cmd, shell=True, stdout=subprocess.PIPE).stdout.read())
cpu_result = subprocess.Popen(cpu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
ipu_result = subprocess.Popen(ipu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
result_pattern = r'.*label_ids: (\d+).*scores: (\d*\.?\d*)'
cpu_match = re.match(result_pattern, cpu_result.decode('utf-8').replace('\n', ''))
ipu_match = re.match(result_pattern, ipu_result.decode('utf-8').replace('\n', ''))
print("=============================={}==============================".format(k))
if cpu_match and ipu_match:
print("cpu_label: {}, cpu_score: {}".format(cpu_match.group(1), cpu_match.group(2)))
print("ipu_label: {}, ipu_score: {}".format(ipu_match.group(1), ipu_match.group(2)))
else:
print("FAILED RUN")
print("=============================={}==============================".format(k))
- 测试结果:
==============================PPLCNet_x1_0==============================
cpu_label: 153, cpu_score: 0.612086
ipu_label: 153, ipu_score: 0.612087
==============================PPLCNet_x1_0==============================
==============================PPLCNetV2_base==============================
cpu_label: 332, cpu_score: 0.278354
ipu_label: 332, ipu_score: 0.278357
==============================PPLCNetV2_base==============================
==============================EfficientNetB7==============================
cpu_label: 332, cpu_score: 0.564357
ipu_label: 332, ipu_score: 0.564378
==============================EfficientNetB7==============================
==============================EfficientNetB0_small==============================
cpu_label: 153, cpu_score: 0.525857
ipu_label: 153, ipu_score: 0.525857
==============================EfficientNetB0_small==============================
==============================GhostNet_x1_3_ssld==============================
cpu_label: 153, cpu_score: 0.849879
ipu_label: 153, ipu_score: 0.849879
==============================GhostNet_x1_3_ssld==============================
==============================GhostNet_x0_5_ssld==============================
cpu_label: 283, cpu_score: 0.341981
ipu_label: 283, ipu_score: 0.341981
==============================GhostNet_x0_5_ssld==============================
==============================MobileNetV1_x0_25==============================
cpu_label: 153, cpu_score: 0.221087
ipu_label: 153, ipu_score: 0.221088
==============================MobileNetV1_x0_25==============================
==============================MobileNetV1_ssld==============================
cpu_label: 332, cpu_score: 0.742867
ipu_label: 332, ipu_score: 0.742867
==============================MobileNetV1_ssld==============================
==============================MobileNetV2_x0_25==============================
cpu_label: 207, cpu_score: 0.247315
ipu_label: 207, ipu_score: 0.247313
==============================MobileNetV2_x0_25==============================
==============================MobileNetV3_small_x0_35_ssld==============================
cpu_label: 153, cpu_score: 0.494442
ipu_label: 153, ipu_score: 0.494442
==============================MobileNetV3_small_x0_35_ssld==============================
==============================MobileNetV3_large_x1_0_ssld==============================
cpu_label: 153, cpu_score: 0.521042
ipu_label: 153, ipu_score: 0.521041
==============================MobileNetV3_large_x1_0_ssld==============================
==============================ShuffleNetV2_x0_25==============================
cpu_label: 259, cpu_score: 0.240480
ipu_label: 259, ipu_score: 0.240481
==============================ShuffleNetV2_x0_25==============================
==============================ShuffleNetV2_x2_0==============================
cpu_label: 153, cpu_score: 0.842726
ipu_label: 153, ipu_score: 0.842727
==============================ShuffleNetV2_x2_0==============================
==============================SqueezeNet1_1==============================
cpu_label: 338, cpu_score: 0.189432
ipu_label: 338, ipu_score: 0.189433
==============================SqueezeNet1_1==============================
==============================PPHGNet_tiny_ssld==============================
cpu_label: 153, cpu_score: 0.536040
ipu_label: 153, ipu_score: 0.536039
==============================PPHGNet_tiny_ssld==============================
==============================PPHGNet_base_ssld==============================
cpu_label: 332, cpu_score: 0.996301
ipu_label: 332, ipu_score: 0.996301
==============================PPHGNet_base_ssld==============================
==============================ResNet50_vd==============================
cpu_label: 153, cpu_score: 0.686229
ipu_label: 153, ipu_score: 0.686230
==============================ResNet50_vd==============================
benchmark
测试使用脚本,将其中的运行命令改为:
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --iter_num 2000 --backend paddle --device ipu
测试覆盖readme 中提供的除inceptionV3以外所有模型。截取部分输出log如下:
[FastDeploy] Running PPcls benchmark...
[Benchmark-PPcls] 1/20 ppcls_model/EfficientNetB0_small_infer ...
Total iterations: 2000
Total time of runtime: 3.46793s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.703937s.
Average time of runtime exclude warmup step: 1.72749ms.
[Benchmark-PPcls] 3/20 ppcls_model/EfficientNetB7_infer ...
Total iterations: 2000
Total time of runtime: 20.3836s.
Warmup iterations: 400
Total time of runtime in warmup step: 4.06914s.
Average time of runtime exclude warmup step: 10.1965ms.
[Benchmark-PPcls] 4/20 ppcls_model/GhostNet_x0_5_infer ...
Total iterations: 2000
Total time of runtime: 3.26153s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.6352s.
Average time of runtime exclude warmup step: 1.64145ms.
[Benchmark-PPcls] 5/20 ppcls_model/GhostNet_x1_3_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.57343s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.692799s.
Average time of runtime exclude warmup step: 1.8004ms.
[Benchmark-PPcls] 7/20 ppcls_model/MobileNetV1_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.8455s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.574721s.
Average time of runtime exclude warmup step: 1.41924ms.
[Benchmark-PPcls] 8/20 ppcls_model/MobileNetV1_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.63379s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.518629s.
Average time of runtime exclude warmup step: 1.32198ms.
[Benchmark-PPcls] 9/20 ppcls_model/MobileNetV2_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.20334s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.61259s.
Average time of runtime exclude warmup step: 1.61922ms.
[Benchmark-PPcls] 10/20 ppcls_model/MobileNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.93448s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.561751s.
Average time of runtime exclude warmup step: 1.48296ms.
[Benchmark-PPcls] 11/20 ppcls_model/MobileNetV3_large_x1_0_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.09113s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.614774s.
Average time of runtime exclude warmup step: 1.54772ms.
[Benchmark-PPcls] 12/20 ppcls_model/MobileNetV3_small_x0_35_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.87719s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.543467s.
Average time of runtime exclude warmup step: 1.45858ms.
[Benchmark-PPcls] 13/20 ppcls_model/PPHGNet_base_ssld_infer ...
Total iterations: 2000
Total time of runtime: 6.51754s.
Warmup iterations: 400
Total time of runtime in warmup step: 1.30042s.
Average time of runtime exclude warmup step: 3.26069ms.
[Benchmark-PPcls] 14/20 ppcls_model/PPHGNet_tiny_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.71101s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.698029s.
Average time of runtime exclude warmup step: 1.88311ms.
[Benchmark-PPcls] 15/20 ppcls_model/PPLCNetV2_base_infer ...
Total iterations: 2000
Total time of runtime: 2.87388s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.572371s.
Average time of runtime exclude warmup step: 1.43844ms.
[Benchmark-PPcls] 16/20 ppcls_model/PPLCNet_x1_0_infer ...
Total iterations: 2000
Total time of runtime: 2.88727s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.569004s.
Average time of runtime exclude warmup step: 1.44892ms.
[Benchmark-PPcls] 17/20 ppcls_model/ResNet50_vd_infer ...
Total iterations: 2000
Total time of runtime: 3.86693s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.749314s.
Average time of runtime exclude warmup step: 1.94851ms.
[Benchmark-PPcls] 18/20 ppcls_model/ShuffleNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.76203s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.577006s.
Average time of runtime exclude warmup step: 1.36564ms.
[Benchmark-PPcls] 19/20 ppcls_model/ShuffleNetV2_x2_0_infer ...
Total iterations: 2000
Total time of runtime: 3.16924s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.640512s.
Average time of runtime exclude warmup step: 1.58046ms.
[Benchmark-PPcls] 20/20 ppcls_model/SqueezeNet1_1_infer ...
Total iterations: 2000
Total time of runtime: 2.50874s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.495713s.
Average time of runtime exclude warmup step: 1.25814ms.
注意: 这里的benchmark结果仅仅用于PR测试,由于后续的硬件变动,当前的数据不具有性能参考意义。
@leiqing1 麻烦帮忙Review下文档的修改
解决develop 分支冲突出了点差错,git amend 冲突正确的改动上来