MNN
MNN copied to clipboard
mnn推理时间异常
两个现象: 1.fp32和fp16的推理时间一样 2.int8的推理时间大于fp16/fp32
模型 | fp32/ms | fp16/ms | int8/ms |
---|---|---|---|
较大 | 313 | 312 | 339 |
较小 | 41 | 40 | 47 |
armv8,linux aarch64
推理代码上,唯一的区别就是fp32是Precision_High,fp16和int8是Precision_Low。
可能的原因是什么呢?谢谢
测试用哪个模型?有参考的开源模型吗?
- ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
- 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试
测试用哪个模型?有参考的开源模型吗?
这是用benchmark跑的结果
## fp32推理
./benchmark.out models/ 50 0 0 1 1 0 1 0
MNN benchmark
Forward type: CPU thread=1 precision=1 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision!=2, use fp32 inference.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn max = 60.000 ms min = 24.000 ms avg = 38.203 ms
[ - ] mobilenet-v1-1.0.mnn max = 112.000 ms min = 43.000 ms avg = 69.666 ms
[ - ] SqueezeNetV1.0.mnn max = 111.000 ms min = 44.000 ms avg = 70.897 ms
[ - ] resnet-v2-50.mnn max = 375.593 ms min = 250.000 ms avg = 315.703 ms
[ - ] inception-v3.mnn max = 544.302 ms min = 411.683 ms avg = 481.361 ms
[ - ] nasnet.mnn max = 141.000 ms min = 67.000 ms avg = 97.284 ms
[ - ] MobileNetV2_224.mnn max = 65.000 ms min = 26.994 ms avg = 41.303 ms
## fp16推理
./benchmark.out models/ 50 0 0 1 2 0 1 0
MNN benchmark
Forward type: CPU thread=1 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn max = 59.000 ms min = 22.000 ms avg = 37.575 ms
[ - ] mobilenet-v1-1.0.mnn max = 98.000 ms min = 44.000 ms avg = 68.641 ms
[ - ] SqueezeNetV1.0.mnn max = 114.000 ms min = 47.000 ms avg = 71.006 ms
[ - ] resnet-v2-50.mnn max = 397.000 ms min = 234.340 ms avg = 303.782 ms
[ - ] inception-v3.mnn max = 533.531 ms min = 433.063 ms avg = 482.815 ms
[ - ] nasnet.mnn max = 134.000 ms min = 81.866 ms avg = 102.169 ms
[ - ] MobileNetV2_224.mnn max = 63.000 ms min = 27.000 ms avg = 39.890 ms
- ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
- 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试
1.好的,我这样编译和推理试一下 2.目前用的是2.8.2,应该算比较新了吧。下面是int8跑benchmark的结果
## int8推理
./benchmark.out models/ 50 0 0 1 2 0 1 1
MNN benchmark
Forward type: CPU thread=1 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=1
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
[-INFO-]: Auto set sparsity=0 when test quantized model in benchmark...
Auto set sparsity=0 when test quantized model in benchmark...
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn max = 68.000 ms min = 25.000 ms avg = 40.229 ms
[ - ] quant-squeezenetv1.1.mnn max = 64.000 ms min = 34.000 ms avg = 48.659 ms
[ - ] mobilenet-v1-1.0.mnn max = 108.000 ms min = 39.999 ms avg = 69.633 ms
[ - ] quant-mobilenet-v1-1.0.mnn max = 93.000 ms min = 47.994 ms avg = 63.897 ms
[ - ] SqueezeNetV1.0.mnn max = 104.000 ms min = 53.000 ms avg = 72.103 ms
[ - ] quant-SqueezeNetV1.0.mnn max = 123.000 ms min = 78.000 ms avg = 97.089 ms
[ - ] resnet-v2-50.mnn max = 370.000 ms min = 208.000 ms avg = 307.182 ms
[ - ] quant-resnet-v2-50.mnn max = 403.509 ms min = 305.362 ms avg = 354.695 ms
[ - ] inception-v3.mnn max = 579.816 ms min = 427.000 ms avg = 480.296 ms
[ - ] quant-inception-v3.mnn max = 0.000 ms min = 0.000 ms avg = 0.000 ms
[ - ] nasnet.mnn max = 132.000 ms min = 0.000 ms avg = 94.448 ms
[ - ] quant-nasnet.mnn max = 0.000 ms min = 0.000 ms avg = 0.000 ms
[ - ] MobileNetV2_224.mnn max = 68.000 ms min = -0.000 ms avg = 41.312 ms
[ - ] quant-MobileNetV2_224.mnn max = 75.000 ms min = 34.000 ms avg = 44.936 ms
- ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
- 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试
1.关于1打开bf16解了一些编译问题,测试发现: 运行50次,使用low_bf16(precision=3)的运行时间为46ms,使用low(precision=2)的运行时间为41ms。 模型完全一样,推理代码只有precision的区别。
bf16更慢也是能感受到的,不是单次误差,测过多次loop=50,都是bf16的推理时间更长。
Hi @jamesdod could u find any solution for this slower fp16 when compared to fp32?
I am also running Unet based architecture using MNN.. and fp16 model latency is almost same as fp32 model latency (both using precision mode 2) using GPU_OPENCL delegate.. though fp16 model is half the size of fp32 model
And if I use the same model using tensorflow's tflite on GPU and fp16 mode.. it's giving almost half of MNN's latency
Device: Qualcomm SM8550 GPU MNN version: 2.8.3
Marking as stale. No activity in 60 days.