dlib-for-android
dlib-for-android copied to clipboard
Performance optimisation
Do you support neon instruction set optimization from this thread https://github.com/davisking/dlib/issues/276
Hi @lemberh, unfortunately I support only the basic architectures.
Did you try to use those optimizations? I have changed cmake to :
${AndroidCmake} -DBUILD_SHARED_LIBS=1 \
-DANDROID_NDK=${NDK} \
-DCMAKE_SYSTEM_NAME=Android \
-DCMAKE_TOOLCHAIN_FILE=${TOOLCHAIN} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_FLAGS="-std=c++11 -frtti -fexceptions -march=armv7-a -mfpu=neon" \
-DANDROID_ARM_NEON=TRUE \
-DCMAKE_C_FLAGS=-O3 \
-DANDROID_ABI=${abi} \
-DANDROID_PLATFORM=${MIN_SDK} \
-DANDROID_TOOLCHAIN=clang \
-DANDROID_STL=c++_shared \
-DANDROID_CPP_FEATURES=rtti exceptions \
-DCMAKE_PREFIX_PATH=../../ \
../../
But it doesnt seems to have any impact on performance
Have you tried to add the ABI "armeabi-v7a with NEON" in the script?
You should edit line 17 of setup.sh
(I guess you're using Linux), having something like this:
ABI=('armeabi-v7a' 'armeabi-v7a with NEON' 'arm64-v8a' 'x86' 'x86_64')
Let me know if now it works.
For more information you can read this.
I have tried this. Unfortunately, I don't see any difference. In Android CMake documentation it is said that
armeabi-v7a with NEON | Same as -DANDROID_ABI=armeabi-v7a -DANDROID_ARM_NEON=ON.
https://developer.android.com/ndk/guides/cmake#android_abi
I'm using face recognition, this example http://dlib.net/dnn_face_recognition_ex.cpp.html
but slightly modified without face detection.
On devices with arm64-v8a
it takes around 700ms to calculate face vector.
But on devices with 'armeabi-v7a'
from 2 up to 5 seconds to calculate face vector.
I'm wondering if that can be improved with NEON instructions.
To gain a performance speed you can try to:
- Process grayscale images, instead of rgb.
- Downscale the input images, as well as the network input.
- Reduce the neural network size, i.e. less layers, less filters.
Thanks for suggestions, will try them!
building dlib with linked OpenBLAS improve performance greatly. Instructions can be found here https://github.com/davisking/dlib/issues/1238#issuecomment-382712052
for example on redmi 7a face descriptor calculation took ~3.5 sec for me on default prebuilt dlib .so's, after rebuilding dlib with OpenBLAS it takes ~350ms 10 times faster