XNNPACK Xnnpack still builds with `+dotprod` and `+fp16` with `-DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16

I'm building aan Arm64 target with a fairly old toolchain (gcc 7.5, binutils 2.29.1) in order to support old Linux platforms. I use: -DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF Yet Xnnpack still seems to build with +dotprod and +fp16:

In file included from /home/personau/LinuxToolchainsTest/tflite_aarch64_release/xnnpack/src/f16-dwconv2d-chw/gen/5x5s2p2-minmax-neonfp16arith-1x4.c:12:0:
/home/personau/x-tools/aarch64-unknown-linux-gnu-glibc2.25-gcc7.5/lib/gcc/aarch64-unknown-linux-gnu/7.5.0/include/arm_neon.h:17259:1: note: expected 'const float16_t * {aka const __fp16 *}' but argument is of type 'const uint16_t * {aka const short unsigned int *
'
 vld1_dup_f16 (const float16_t* __a)
 ^~~~~~~~~~~~
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/XNNPACK.dir/build.make:4093: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/f16-gemm/gen-inc/1x8inc-minmax-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6137: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
gmake: *** [Makefile:136: all] Error 2

Mar 12 '24 12:03 misterBart

the build system determines which kernels to build. the macros reflect what was enabled and wont test/use the disabled kernels. with bazel there are flags to control each instruction set:

--define=xnn_enable_arm_fp16_vector=false
--define=xnn_enable_arm_dotprod=false

cmake has options, but I'm not familiar with the usage

XNNPACK_ENABLE_ARM_FP16_VECTOR
XNNPACK_ENABLE_ARM_DOTPROD

On Intel I added some gcc version checking to force the flags off, and that could be done for arm gcc with a change to CMakeLists.txt.. it would be something like:

IF(CMAKE_C_COMPILER_ID STREQUAL "GNU")
  IF(CMAKE_C_COMPILER_VERSION VERSION_LESS "11")
    SET(XNNPACK_ENABLE_ARM_FP16_VECTOR OFF)
    SET(XNNPACK_ENABLE_ARM_DOTPROD OFF)
  ENDIF()
ENDIF()```

Mar 12 '24 22:03 fbarchard

cmake has options, but I'm not familiar with the usage
XNNPACK_ENABLE_ARM_FP16_VECTOR
XNNPACK_ENABLE_ARM_DOTPROD

Yes, I already turned these off, see my opening post. The problem is that, even though I set these CMake options to OFF, Xnnpack still builds with +dotprod and +fp16.

Mar 13 '24 09:03 misterBart

What version of XNNPack are you building? The failing file was removed on Sep 26, 2022

Mar 27 '24 07:03 alankelly

The version part of TfLite 2.10. (Can I check the specific Xnnpack version in the TfLite source code?) TfLite 2.10.1 was released Nov 16, 2022. Perhaps that TfLite still includes the failing file.

Mar 27 '24 10:03 misterBart

Can you update to the latest release? We can't fix old releases.

Mar 27 '24 10:03 alankelly

Still getting the errors with the latest TfLite release (2.16):

cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build.make:173: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:6832: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:40157: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6806: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/all] Error 2
gmake: *** [Makefile:136: all] Error 2

Steps I execute:

git clone --single-branch --branch r2.16 https://github.com/tensorflow/tensorflow tensorflow_src
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchain_aarch64.cmake -DCMAKE_BUILD_TYPE=release -DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF ../tensorflow_src/tensorflow/lite
cmake --build . -j 8 --config release

Mar 28 '24 08:03 misterBart

Can you try adding -DXNNPACK_ENABLE_ASSEMBLY=OFF?

Mar 28 '24 12:03 alankelly

After adding that option TfLite 2.16 builds without errors, and I can run a test program on an Arm64 board using TfLite 2.16. But before I cheer too early, the test program runs slower now, which naturally comes from disabling the use of assembly code. -DXNNPACK_ENABLE_ASSEMBLY=OFF is too profound. The Arm64 board does not support float16, etc. but I would still like to use the other assembly micro-kernels in Xnnpack.

Mar 28 '24 16:03 misterBart

Ok, we know what the problem is now. The solution is to get the update-microkernels script to split the assembly files into ones with and without arm V8 and to create new targets with the appropriate compilation options. Would you like to send a PR to do this?

Mar 28 '24 16:03 alankelly

A PR suggests I know what to fix in the codebase, which I don't.

Mar 29 '24 08:03 misterBart

Xnnpack still builds with `+dotprod` and `+fp16` with `-DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF`