Performance degradation from `2.8.1` to `2.9.2`
平台(如果交叉编译请再附上交叉编译目标平台):
Platform(Include target platform as well if cross-compiling):
Processor SoC QCM2290.
Github版本:
Github Version:
Version tags:
- Version
2.8.1: https://github.com/alibaba/MNN/tree/d284430f92557aa8b4cc435752b1dff3309f2e38 - Version
2.9.2: https://github.com/alibaba/MNN/tree/e1011161ed0382e1a33a65bfdde8bee931dbcfaf
编译方式:
Compiling Method
cmake \
-DCMAKE_INSTALL_PREFIX=${SCRIPTPATH}/.install_android/ \
-DCMAKE_TOOLCHAIN_FILE=~/Android/Sdk/ndk/25.2.9519653/build/cmake/android.toolchain.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DANDROID_ABI="arm64-v8a" \
-DANDROID_STL=c++_shared \
-DMNN_USE_LOGCAT=ON \
-DMNN_ARM82=ON \
-DMNN_SUPPORT_BF16=ON \
-DMNN_OPENCL=ON \
-DMNN_VULKAN=ON \
-DMNN_BUILD_OPENCV=ON \
-DMNN_IMGCODECS=ON \
-DMNN_JNI=ON \
-DANDROID_NATIVE_API_LEVEL=android-21 \
-DMNN_BUILD_FOR_ANDROID_COMMAND=true \
-DNATIVE_LIBRARY_OUTPUT=. -DNATIVE_INCLUDE_OUTPUT=. \
-DMNN_BUILD_TEST=ON \
-DMNN_BUILD_CONVERTER=ON \
-DMNN_BUILD_BENCHMARK=ON \
../
The performance of version 2.9.2 is worse than the 2.8.1, using the benchmark tool:
- Version
2.9.2
bengal_2w:/data/local/tmp/mnn-2.9.2-lib-arm64 # LD_LIBRARY_PATH=./:../cpp_shared/arm64-v8a/ ./benchmark.out ../ai-models/ 10 3 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 10, warmup = 3
[ - ] yolov8n_160.mnn max = 87.178 ms min = 84.706 ms avg = 85.698 ms
- Version
2.8.1:
bengal_2w:/data/local/tmp/MNN # LD_LIBRARY_PATH=./lib/ bin/benchmark.out models/ 10 3 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 10, warmup = 3
[ - ] yolov8n_160.mnn max = 65.977 ms min = 62.635 ms avg = 63.617 ms
The model used for this test is the YoloV8 Nano from Ultralytics, using an image input size of 160x160.
收到,我们排查一下
内部代码修正,近期同步
2.9.5 已经修正,可以更新并测试下,内部验证结果是快于 2.8.1 版本
@jxt1234,
I tested again the same model in the board QCM2290, but I'm still getting worse numbers than the version 2.8.1:
bengal_2w:/data/local/tmp/mnn-2.9.5 # LD_LIBRARY_PATH=./lib/ ./bin/benchmark.out ../mnn-models/ 10 3 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 10, warmup = 3
[ - ] yolov8n_160.mnn max = 83.413 ms min = 81.695 ms avg = 82.418 ms
However, the performance is indeed better than the 2.9.2.
Using our internal applications I see around the same performance numbers as your benchmark binary.