armv7-functions
armv7-functions copied to clipboard
Implementation of various math, img processing, etc functions for ARMv7 and NEON
ARMv7 Functions
This is a collection of various functions optimized for armv7 and neon.
The five holy laws
- Never return floating point values by value. It would work fine if
-mfloat-abi=hardwas supported everywhere, but sadly it's not. With the more common-mfloat-abi=softfp, every time you do areturn my_float_value, it does either afmrsor avstr, followed by a load operation in order to read the result back! Instead, use a non-const reference as first parameter. It allows super smooth inlining of your intermediate results without unnecessary loads and stores, just like it would do if hard floats were available (works for vector types too) ! - Try to minimize loads and stores. Though GCC doesn't support evolved
vldmia/vstmiaand will generate poor code for operations onfloat32x4x4_t, so handcoding them make sense in that case. - Use vector types everywhere it makes sense. Functions prefixed with
vec3_andvec4_directly work onfloat32x4_t. Those prefixed withmat44_directly work withfloat32x4x4_t. Parameters are passed as references, so the compiler doesn't perform unnecessary ARM register transfers. - Don't hard-code registers, but use dummy values instead for clobber, and let the compiler allocate registers as needed.
- A good clobber list is an empty clobber list. If you let the compiler handle loads for you, "memory" shouldn't even show up in your clobber list. The only item that might is "cc".
Compilation flags
For best performance I usually use the following CFLAGS: -mthumb -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad -O3 -ffast-math -fomit-frame-pointer -fstrict-aliasing -fgcse-las -funsafe-loop-optimizations -fsee -ftree-vectorize, with -arch armv7 if it's gcc for iOS or -march=armv7-a if it's eabi-none-gcc.
Preprocessor macros
Several preprocessor macros, when defined, change the behaviour of the code. See config.h and config-defaults.h for details…