Xnnpack still builds with `+dotprod` and `+fp16` with `-DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF`
I'm building aan Arm64 target with a fairly old toolchain (gcc 7.5, binutils 2.29.1) in order to support old Linux platforms.
I use:
-DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF
Yet Xnnpack still seems to build with +dotprod and +fp16:
In file included from /home/personau/LinuxToolchainsTest/tflite_aarch64_release/xnnpack/src/f16-dwconv2d-chw/gen/5x5s2p2-minmax-neonfp16arith-1x4.c:12:0:
/home/personau/x-tools/aarch64-unknown-linux-gnu-glibc2.25-gcc7.5/lib/gcc/aarch64-unknown-linux-gnu/7.5.0/include/arm_neon.h:17259:1: note: expected 'const float16_t * {aka const __fp16 *}' but argument is of type 'const uint16_t * {aka const short unsigned int *
'
vld1_dup_f16 (const float16_t* __a)
^~~~~~~~~~~~
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/XNNPACK.dir/build.make:4093: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/f16-gemm/gen-inc/1x8inc-minmax-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6137: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
gmake: *** [Makefile:136: all] Error 2
the build system determines which kernels to build. the macros reflect what was enabled and wont test/use the disabled kernels. with bazel there are flags to control each instruction set:
--define=xnn_enable_arm_fp16_vector=false
--define=xnn_enable_arm_dotprod=false
cmake has options, but I'm not familiar with the usage
XNNPACK_ENABLE_ARM_FP16_VECTOR
XNNPACK_ENABLE_ARM_DOTPROD
On Intel I added some gcc version checking to force the flags off, and that could be done for arm gcc with a change to CMakeLists.txt.. it would be something like:
IF(CMAKE_C_COMPILER_ID STREQUAL "GNU")
IF(CMAKE_C_COMPILER_VERSION VERSION_LESS "11")
SET(XNNPACK_ENABLE_ARM_FP16_VECTOR OFF)
SET(XNNPACK_ENABLE_ARM_DOTPROD OFF)
ENDIF()
ENDIF()```
cmake has options, but I'm not familiar with the usage
XNNPACK_ENABLE_ARM_FP16_VECTOR XNNPACK_ENABLE_ARM_DOTPROD
Yes, I already turned these off, see my opening post. The problem is that, even though I set these CMake options to OFF, Xnnpack still builds with +dotprod and +fp16.
What version of XNNPack are you building? The failing file was removed on Sep 26, 2022
The version part of TfLite 2.10. (Can I check the specific Xnnpack version in the TfLite source code?) TfLite 2.10.1 was released Nov 16, 2022. Perhaps that TfLite still includes the failing file.
Can you update to the latest release? We can't fix old releases.
Still getting the errors with the latest TfLite release (2.16):
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build.make:173: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:6832: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:40157: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6806: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/all] Error 2
gmake: *** [Makefile:136: all] Error 2
Steps I execute:
git clone --single-branch --branch r2.16 https://github.com/tensorflow/tensorflow tensorflow_src
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchain_aarch64.cmake -DCMAKE_BUILD_TYPE=release -DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF ../tensorflow_src/tensorflow/lite
cmake --build . -j 8 --config release
Can you try adding -DXNNPACK_ENABLE_ASSEMBLY=OFF?
After adding that option TfLite 2.16 builds without errors, and I can run a test program on an Arm64 board using TfLite 2.16. But before I cheer too early, the test program runs slower now, which naturally comes from disabling the use of assembly code. -DXNNPACK_ENABLE_ASSEMBLY=OFF is too profound. The Arm64 board does not support float16, etc. but I would still like to use the other assembly micro-kernels in Xnnpack.
Ok, we know what the problem is now. The solution is to get the update-microkernels script to split the assembly files into ones with and without arm V8 and to create new targets with the appropriate compilation options. Would you like to send a PR to do this?
A PR suggests I know what to fix in the codebase, which I don't.