dlib-for-android icon indicating copy to clipboard operation
dlib-for-android copied to clipboard

Performance optimisation

Open lemberh opened this issue 4 years ago • 7 comments

Do you support neon instruction set optimization from this thread https://github.com/davisking/dlib/issues/276

lemberh avatar Nov 28 '19 16:11 lemberh

Hi @lemberh, unfortunately I support only the basic architectures.

Luca96 avatar Nov 29 '19 11:11 Luca96

Did you try to use those optimizations? I have changed cmake to :

  ${AndroidCmake}   -DBUILD_SHARED_LIBS=1 \
	  -DANDROID_NDK=${NDK} \
	  -DCMAKE_SYSTEM_NAME=Android \
	  -DCMAKE_TOOLCHAIN_FILE=${TOOLCHAIN} \
	  -DCMAKE_BUILD_TYPE=Release \
	  -DCMAKE_CXX_FLAGS="-std=c++11 -frtti -fexceptions -march=armv7-a -mfpu=neon" \
	  -DANDROID_ARM_NEON=TRUE \
	  -DCMAKE_C_FLAGS=-O3 \
	  -DANDROID_ABI=${abi} \
	  -DANDROID_PLATFORM=${MIN_SDK} \
	  -DANDROID_TOOLCHAIN=clang \
	  -DANDROID_STL=c++_shared \
	  -DANDROID_CPP_FEATURES=rtti exceptions \
	  -DCMAKE_PREFIX_PATH=../../ \
	  ../../			  

But it doesnt seems to have any impact on performance

lemberh avatar Nov 30 '19 01:11 lemberh

Have you tried to add the ABI "armeabi-v7a with NEON" in the script?

You should edit line 17 of setup.sh (I guess you're using Linux), having something like this:

ABI=('armeabi-v7a' 'armeabi-v7a with NEON' 'arm64-v8a' 'x86' 'x86_64')

Let me know if now it works.

For more information you can read this.

Luca96 avatar Nov 30 '19 11:11 Luca96

I have tried this. Unfortunately, I don't see any difference. In Android CMake documentation it is said that

armeabi-v7a with NEON | Same as -DANDROID_ABI=armeabi-v7a -DANDROID_ARM_NEON=ON.

https://developer.android.com/ndk/guides/cmake#android_abi

I'm using face recognition, this example http://dlib.net/dnn_face_recognition_ex.cpp.html but slightly modified without face detection. On devices with arm64-v8a it takes around 700ms to calculate face vector. But on devices with 'armeabi-v7a' from 2 up to 5 seconds to calculate face vector. I'm wondering if that can be improved with NEON instructions.

lemberh avatar Nov 30 '19 12:11 lemberh

To gain a performance speed you can try to:

  • Process grayscale images, instead of rgb.
  • Downscale the input images, as well as the network input.
  • Reduce the neural network size, i.e. less layers, less filters.

Luca96 avatar Dec 05 '19 09:12 Luca96

Thanks for suggestions, will try them!

lemberh avatar Dec 06 '19 14:12 lemberh

building dlib with linked OpenBLAS improve performance greatly. Instructions can be found here https://github.com/davisking/dlib/issues/1238#issuecomment-382712052

for example on redmi 7a face descriptor calculation took ~3.5 sec for me on default prebuilt dlib .so's, after rebuilding dlib with OpenBLAS it takes ~350ms 10 times faster

opiumfive avatar Sep 18 '20 17:09 opiumfive