MNN icon indicating copy to clipboard operation
MNN copied to clipboard

Performance degradation from `2.8.1` to `2.9.2`

Open caiotoledo-lunasystems opened this issue 1 year ago • 1 comments

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

Processor SoC QCM2290.

Github版本:

Github Version:

Version tags:

  • Version 2.8.1: https://github.com/alibaba/MNN/tree/d284430f92557aa8b4cc435752b1dff3309f2e38
  • Version 2.9.2: https://github.com/alibaba/MNN/tree/e1011161ed0382e1a33a65bfdde8bee931dbcfaf

编译方式:

Compiling Method

cmake \
  -DCMAKE_INSTALL_PREFIX=${SCRIPTPATH}/.install_android/ \
  -DCMAKE_TOOLCHAIN_FILE=~/Android/Sdk/ndk/25.2.9519653/build/cmake/android.toolchain.cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DANDROID_ABI="arm64-v8a" \
  -DANDROID_STL=c++_shared \
  -DMNN_USE_LOGCAT=ON \
  -DMNN_ARM82=ON \
  -DMNN_SUPPORT_BF16=ON \
  -DMNN_OPENCL=ON \
  -DMNN_VULKAN=ON \
  -DMNN_BUILD_OPENCV=ON \
  -DMNN_IMGCODECS=ON \
  -DMNN_JNI=ON \
  -DANDROID_NATIVE_API_LEVEL=android-21 \
  -DMNN_BUILD_FOR_ANDROID_COMMAND=true \
  -DNATIVE_LIBRARY_OUTPUT=. -DNATIVE_INCLUDE_OUTPUT=. \
  -DMNN_BUILD_TEST=ON \
  -DMNN_BUILD_CONVERTER=ON \
  -DMNN_BUILD_BENCHMARK=ON \
  ../


The performance of version 2.9.2 is worse than the 2.8.1, using the benchmark tool:

  • Version 2.9.2
bengal_2w:/data/local/tmp/mnn-2.9.2-lib-arm64 # LD_LIBRARY_PATH=./:../cpp_shared/arm64-v8a/ ./benchmark.out ../ai-models/ 10 3 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 10, warmup = 3
[ - ] yolov8n_160.mnn      max =   87.178 ms  min =   84.706 ms  avg =   85.698 ms
  • Version 2.8.1:
bengal_2w:/data/local/tmp/MNN # LD_LIBRARY_PATH=./lib/ bin/benchmark.out models/ 10 3 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 10, warmup = 3
[ - ] yolov8n_160.mnn         max =   65.977 ms  min =   62.635 ms  avg =   63.617 ms

The model used for this test is the YoloV8 Nano from Ultralytics, using an image input size of 160x160.

caiotoledo-lunasystems avatar Jul 08 '24 09:07 caiotoledo-lunasystems

收到,我们排查一下

jxt1234 avatar Jul 22 '24 12:07 jxt1234

内部代码修正,近期同步

jxt1234 avatar Sep 06 '24 09:09 jxt1234

2.9.5 已经修正,可以更新并测试下,内部验证结果是快于 2.8.1 版本

jxt1234 avatar Sep 12 '24 08:09 jxt1234

@jxt1234,

I tested again the same model in the board QCM2290, but I'm still getting worse numbers than the version 2.8.1:

bengal_2w:/data/local/tmp/mnn-2.9.5 # LD_LIBRARY_PATH=./lib/ ./bin/benchmark.out ../mnn-models/ 10 3 3
MNN benchmark                                                                                  
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 10, warmup = 3
[ - ] yolov8n_160.mnn         max =   83.413 ms  min =   81.695 ms  avg =   82.418 ms

However, the performance is indeed better than the 2.9.2.

Using our internal applications I see around the same performance numbers as your benchmark binary.

caiotoledo-lunasystems avatar Sep 12 '24 16:09 caiotoledo-lunasystems