FastDeploy [Backend] support ipu in paddle inference backend.

[Backend] support ipu in paddle inference backend.

Open czr-gc opened this issue 1 year ago • 3 comments

PR types(PR类型)

Backend

Describe

增加paddle inference backend对IPU的支持。

Oct 26 '22 06:10 czr-gc

测试结果：

example测试：

测试覆盖fastdeploy readme 中提供的除inceptionV3以外所有模型，用一张图片测试推理。

测试脚本:

import os
import re
import subprocess

model_list = {
"PPLCNet_x1_0":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNet_x1_0_infer.tgz",
"PPLCNetV2_base":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNetV2_base_infer.tgz",
"EfficientNetB7":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB7_infer.tgz",
"EfficientNetB0_small":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB0_small_infer.tgz",
"GhostNet_x1_3_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x1_3_ssld_infer.tgz",
"GhostNet_x0_5_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x0_5_infer.tgz",
"MobileNetV1_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_x0_25_infer.tgz",
"MobileNetV1_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_ssld_infer.tgz",
"MobileNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_x0_25_infer.tgz",
"MobileNetV2_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_ssld_infer.tgz",
"MobileNetV3_small_x0_35_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_small_x0_35_ssld_infer.tgz",
"MobileNetV3_large_x1_0_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_large_x1_0_ssld_infer.tgz",
"ShuffleNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x0_25_infer.tgz",
"ShuffleNetV2_x2_0":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x2_0_infer.tgz",
"SqueezeNet1_1":"https://bj.bcebos.com/paddlehub/fastdeploy/SqueezeNet1_1_infer.tgz",
"PPHGNet_tiny_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_tiny_ssld_infer.tgz",
"PPHGNet_base_ssld": "https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_base_ssld_infer.tgz",
"ResNet50_vd": "https://bj.bcebos.com/paddlehub/fastdeploy/ResNet50_vd_infer.tgz",
}

for k, v in model_list.items():
    print("TESTING: {}".format(k))
    pattern = r'.*\/([\d\w_]+).tgz$'
    model_file = re.match(pattern, v).group(1)
    download_cmd = f'''
    wget {v}
    tar -xvf {model_file}.tgz
    '''
    cpu_cmd = f'''
    python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device cpu --topk 1
    '''
    ipu_cmd = f'''
    python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device ipu --topk 1
    '''
    print(subprocess.Popen(download_cmd, shell=True, stdout=subprocess.PIPE).stdout.read())
    cpu_result = subprocess.Popen(cpu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
    ipu_result = subprocess.Popen(ipu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
    result_pattern = r'.*label_ids: (\d+).*scores: (\d*\.?\d*)'
    cpu_match = re.match(result_pattern, cpu_result.decode('utf-8').replace('\n', ''))
    ipu_match = re.match(result_pattern, ipu_result.decode('utf-8').replace('\n', ''))

    print("=============================={}==============================".format(k))
    if cpu_match and ipu_match:
        print("cpu_label: {}, cpu_score: {}".format(cpu_match.group(1), cpu_match.group(2)))
        print("ipu_label: {}, ipu_score: {}".format(ipu_match.group(1), ipu_match.group(2)))
    else:
        print("FAILED RUN")
    print("=============================={}==============================".format(k))

测试结果:

==============================PPLCNet_x1_0==============================
cpu_label: 153, cpu_score: 0.612086
ipu_label: 153, ipu_score: 0.612087
==============================PPLCNet_x1_0==============================

==============================PPLCNetV2_base==============================
cpu_label: 332, cpu_score: 0.278354
ipu_label: 332, ipu_score: 0.278357
==============================PPLCNetV2_base==============================

==============================EfficientNetB7==============================
cpu_label: 332, cpu_score: 0.564357
ipu_label: 332, ipu_score: 0.564378
==============================EfficientNetB7==============================

==============================EfficientNetB0_small==============================
cpu_label: 153, cpu_score: 0.525857
ipu_label: 153, ipu_score: 0.525857
==============================EfficientNetB0_small==============================

==============================GhostNet_x1_3_ssld==============================
cpu_label: 153, cpu_score: 0.849879
ipu_label: 153, ipu_score: 0.849879
==============================GhostNet_x1_3_ssld==============================

==============================GhostNet_x0_5_ssld==============================
cpu_label: 283, cpu_score: 0.341981
ipu_label: 283, ipu_score: 0.341981
==============================GhostNet_x0_5_ssld==============================

==============================MobileNetV1_x0_25==============================
cpu_label: 153, cpu_score: 0.221087
ipu_label: 153, ipu_score: 0.221088
==============================MobileNetV1_x0_25==============================

==============================MobileNetV1_ssld==============================
cpu_label: 332, cpu_score: 0.742867
ipu_label: 332, ipu_score: 0.742867
==============================MobileNetV1_ssld==============================

==============================MobileNetV2_x0_25==============================
cpu_label: 207, cpu_score: 0.247315
ipu_label: 207, ipu_score: 0.247313
==============================MobileNetV2_x0_25==============================

==============================MobileNetV3_small_x0_35_ssld==============================
cpu_label: 153, cpu_score: 0.494442
ipu_label: 153, ipu_score: 0.494442
==============================MobileNetV3_small_x0_35_ssld==============================

==============================MobileNetV3_large_x1_0_ssld==============================
cpu_label: 153, cpu_score: 0.521042
ipu_label: 153, ipu_score: 0.521041
==============================MobileNetV3_large_x1_0_ssld==============================

==============================ShuffleNetV2_x0_25==============================
cpu_label: 259, cpu_score: 0.240480
ipu_label: 259, ipu_score: 0.240481
==============================ShuffleNetV2_x0_25==============================

==============================ShuffleNetV2_x2_0==============================
cpu_label: 153, cpu_score: 0.842726
ipu_label: 153, ipu_score: 0.842727
==============================ShuffleNetV2_x2_0==============================

==============================SqueezeNet1_1==============================
cpu_label: 338, cpu_score: 0.189432
ipu_label: 338, ipu_score: 0.189433
==============================SqueezeNet1_1==============================

==============================PPHGNet_tiny_ssld==============================
cpu_label: 153, cpu_score: 0.536040
ipu_label: 153, ipu_score: 0.536039
==============================PPHGNet_tiny_ssld==============================

==============================PPHGNet_base_ssld==============================
cpu_label: 332, cpu_score: 0.996301
ipu_label: 332, ipu_score: 0.996301
==============================PPHGNet_base_ssld==============================

==============================ResNet50_vd==============================
cpu_label: 153, cpu_score: 0.686229
ipu_label: 153, ipu_score: 0.686230
==============================ResNet50_vd==============================

benchmark

测试使用脚本，将其中的运行命令改为：

python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --iter_num 2000 --backend paddle --device ipu

测试覆盖readme 中提供的除inceptionV3以外所有模型。截取部分输出log如下：

[FastDeploy]    Running PPcls benchmark...
[Benchmark-PPcls] 1/20 ppcls_model/EfficientNetB0_small_infer ...
Total iterations: 2000
Total time of runtime: 3.46793s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.703937s.
Average time of runtime exclude warmup step: 1.72749ms.

[Benchmark-PPcls] 3/20 ppcls_model/EfficientNetB7_infer ...
Total iterations: 2000
Total time of runtime: 20.3836s.
Warmup iterations: 400
Total time of runtime in warmup step: 4.06914s.
Average time of runtime exclude warmup step: 10.1965ms.

[Benchmark-PPcls] 4/20 ppcls_model/GhostNet_x0_5_infer ...
Total iterations: 2000
Total time of runtime: 3.26153s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.6352s.
Average time of runtime exclude warmup step: 1.64145ms.

[Benchmark-PPcls] 5/20 ppcls_model/GhostNet_x1_3_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.57343s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.692799s.
Average time of runtime exclude warmup step: 1.8004ms.

[Benchmark-PPcls] 7/20 ppcls_model/MobileNetV1_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.8455s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.574721s.
Average time of runtime exclude warmup step: 1.41924ms.

[Benchmark-PPcls] 8/20 ppcls_model/MobileNetV1_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.63379s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.518629s.
Average time of runtime exclude warmup step: 1.32198ms.

[Benchmark-PPcls] 9/20 ppcls_model/MobileNetV2_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.20334s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.61259s.
Average time of runtime exclude warmup step: 1.61922ms.

[Benchmark-PPcls] 10/20 ppcls_model/MobileNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.93448s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.561751s.
Average time of runtime exclude warmup step: 1.48296ms.

[Benchmark-PPcls] 11/20 ppcls_model/MobileNetV3_large_x1_0_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.09113s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.614774s.
Average time of runtime exclude warmup step: 1.54772ms.

[Benchmark-PPcls] 12/20 ppcls_model/MobileNetV3_small_x0_35_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.87719s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.543467s.
Average time of runtime exclude warmup step: 1.45858ms.

[Benchmark-PPcls] 13/20 ppcls_model/PPHGNet_base_ssld_infer ...
Total iterations: 2000
Total time of runtime: 6.51754s.
Warmup iterations: 400
Total time of runtime in warmup step: 1.30042s.
Average time of runtime exclude warmup step: 3.26069ms.

[Benchmark-PPcls] 14/20 ppcls_model/PPHGNet_tiny_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.71101s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.698029s.
Average time of runtime exclude warmup step: 1.88311ms.

[Benchmark-PPcls] 15/20 ppcls_model/PPLCNetV2_base_infer ...
Total iterations: 2000
Total time of runtime: 2.87388s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.572371s.
Average time of runtime exclude warmup step: 1.43844ms.

[Benchmark-PPcls] 16/20 ppcls_model/PPLCNet_x1_0_infer ...
Total iterations: 2000
Total time of runtime: 2.88727s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.569004s.
Average time of runtime exclude warmup step: 1.44892ms.

[Benchmark-PPcls] 17/20 ppcls_model/ResNet50_vd_infer ...
Total iterations: 2000
Total time of runtime: 3.86693s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.749314s.
Average time of runtime exclude warmup step: 1.94851ms.

[Benchmark-PPcls] 18/20 ppcls_model/ShuffleNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.76203s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.577006s.
Average time of runtime exclude warmup step: 1.36564ms.

[Benchmark-PPcls] 19/20 ppcls_model/ShuffleNetV2_x2_0_infer ...
Total iterations: 2000
Total time of runtime: 3.16924s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.640512s.
Average time of runtime exclude warmup step: 1.58046ms.

[Benchmark-PPcls] 20/20 ppcls_model/SqueezeNet1_1_infer ...
Total iterations: 2000
Total time of runtime: 2.50874s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.495713s.
Average time of runtime exclude warmup step: 1.25814ms.

注意：这里的benchmark结果仅仅用于PR测试，由于后续的硬件变动，当前的数据不具有性能参考意义。

Oct 27 '22 01:10 czr-gc

@leiqing1 麻烦帮忙Review下文档的修改

Oct 28 '22 03:10 jiangjiajun

解决develop 分支冲突出了点差错，git amend 冲突正确的改动上来

Oct 28 '22 07:10 czr-gc

FastDeploy FastDeploy copied to clipboard

[Backend] support ipu in paddle inference backend.

PR types(PR类型)

Describe

测试结果：

example测试：

benchmark

FastDeploy
FastDeploy copied to clipboard