MNN icon indicating copy to clipboard operation
MNN copied to clipboard

mnn推理时间异常

Open jamesdod opened this issue 9 months ago • 6 comments

两个现象: 1.fp32和fp16的推理时间一样 2.int8的推理时间大于fp16/fp32

模型 fp32/ms fp16/ms int8/ms
较大 313 312 339
较小 41 40 47

armv8,linux aarch64

推理代码上,唯一的区别就是fp32是Precision_High,fp16和int8是Precision_Low。

可能的原因是什么呢?谢谢

jamesdod avatar Apr 28 '24 06:04 jamesdod

测试用哪个模型?有参考的开源模型吗?

zhenjing avatar Apr 29 '24 01:04 zhenjing

  1. ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
  2. 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试

jxt1234 avatar Apr 29 '24 02:04 jxt1234

测试用哪个模型?有参考的开源模型吗?

这是用benchmark跑的结果

## fp32推理
./benchmark.out models/ 50 0 0 1 1 0 1 0
MNN benchmark
Forward type: CPU thread=1 precision=1 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision!=2, use fp32 inference.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn          max =   60.000 ms  min =   24.000 ms  avg =   38.203 ms
[ - ] mobilenet-v1-1.0.mnn        max =  112.000 ms  min =   43.000 ms  avg =   69.666 ms
[ - ] SqueezeNetV1.0.mnn          max =  111.000 ms  min =   44.000 ms  avg =   70.897 ms
[ - ] resnet-v2-50.mnn            max =  375.593 ms  min =  250.000 ms  avg =  315.703 ms
[ - ] inception-v3.mnn            max =  544.302 ms  min =  411.683 ms  avg =  481.361 ms
[ - ] nasnet.mnn                  max =  141.000 ms  min =   67.000 ms  avg =   97.284 ms
[ - ] MobileNetV2_224.mnn         max =   65.000 ms  min =   26.994 ms  avg =   41.303 ms


## fp16推理
./benchmark.out models/ 50 0 0 1 2 0 1 0
MNN benchmark
Forward type: CPU thread=1 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn          max =   59.000 ms  min =   22.000 ms  avg =   37.575 ms
[ - ] mobilenet-v1-1.0.mnn        max =   98.000 ms  min =   44.000 ms  avg =   68.641 ms
[ - ] SqueezeNetV1.0.mnn          max =  114.000 ms  min =   47.000 ms  avg =   71.006 ms
[ - ] resnet-v2-50.mnn            max =  397.000 ms  min =  234.340 ms  avg =  303.782 ms
[ - ] inception-v3.mnn            max =  533.531 ms  min =  433.063 ms  avg =  482.815 ms
[ - ] nasnet.mnn                  max =  134.000 ms  min =   81.866 ms  avg =  102.169 ms
[ - ] MobileNetV2_224.mnn         max =   63.000 ms  min =   27.000 ms  avg =   39.890 ms

jamesdod avatar Apr 29 '24 02:04 jamesdod

  1. ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
  2. 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试

1.好的,我这样编译和推理试一下 2.目前用的是2.8.2,应该算比较新了吧。下面是int8跑benchmark的结果

## int8推理
./benchmark.out models/ 50 0 0 1 2 0 1 1
MNN benchmark
Forward type: CPU thread=1 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=1
--------> Benchmarking... loop = 50, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
[-INFO-]: Auto set sparsity=0 when test quantized model in benchmark...
Auto set sparsity=0 when test quantized model in benchmark...
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] squeezenetv1.1.mnn          max =   68.000 ms  min =   25.000 ms  avg =   40.229 ms
[ - ] quant-squeezenetv1.1.mnn    max =   64.000 ms  min =   34.000 ms  avg =   48.659 ms
[ - ] mobilenet-v1-1.0.mnn        max =  108.000 ms  min =   39.999 ms  avg =   69.633 ms
[ - ] quant-mobilenet-v1-1.0.mnn    max =   93.000 ms  min =   47.994 ms  avg =   63.897 ms
[ - ] SqueezeNetV1.0.mnn          max =  104.000 ms  min =   53.000 ms  avg =   72.103 ms
[ - ] quant-SqueezeNetV1.0.mnn    max =  123.000 ms  min =   78.000 ms  avg =   97.089 ms
[ - ] resnet-v2-50.mnn            max =  370.000 ms  min =  208.000 ms  avg =  307.182 ms
[ - ] quant-resnet-v2-50.mnn      max =  403.509 ms  min =  305.362 ms  avg =  354.695 ms
[ - ] inception-v3.mnn            max =  579.816 ms  min =  427.000 ms  avg =  480.296 ms
[ - ] quant-inception-v3.mnn      max =    0.000 ms  min =    0.000 ms  avg =    0.000 ms
[ - ] nasnet.mnn                  max =  132.000 ms  min =    0.000 ms  avg =   94.448 ms
[ - ] quant-nasnet.mnn            max =    0.000 ms  min =    0.000 ms  avg =    0.000 ms
[ - ] MobileNetV2_224.mnn         max =   68.000 ms  min =   -0.000 ms  avg =   41.312 ms
[ - ] quant-MobileNetV2_224.mnn    max =   75.000 ms  min =   34.000 ms  avg =   44.936 ms


jamesdod avatar Apr 29 '24 02:04 jamesdod

  1. ARMv8 不支持 fp16 计算,可以换用 bf16 试一下:打开 MNN_SUPPORT_BF16 ,precision 设成 low_bf16
  2. 关于 int8 慢的问题,mnn 版本是多少?更到最新再量化试试

1.关于1打开bf16解了一些编译问题,测试发现: 运行50次,使用low_bf16(precision=3)的运行时间为46ms,使用low(precision=2)的运行时间为41ms。 模型完全一样,推理代码只有precision的区别。

bf16更慢也是能感受到的,不是单次误差,测过多次loop=50,都是bf16的推理时间更长。

jamesdod avatar Apr 29 '24 08:04 jamesdod

Hi @jamesdod could u find any solution for this slower fp16 when compared to fp32?

I am also running Unet based architecture using MNN.. and fp16 model latency is almost same as fp32 model latency (both using precision mode 2) using GPU_OPENCL delegate.. though fp16 model is half the size of fp32 model

And if I use the same model using tensorflow's tflite on GPU and fp16 mode.. it's giving almost half of MNN's latency

Device: Qualcomm SM8550 GPU MNN version: 2.8.3

rpratesh avatar Jun 25 '24 13:06 rpratesh

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Aug 25 '24 09:08 github-actions[bot]