XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

Xnnpack still builds with `+dotprod` and `+fp16` with `-DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF`

Open misterBart opened this issue 1 year ago • 10 comments

I'm building aan Arm64 target with a fairly old toolchain (gcc 7.5, binutils 2.29.1) in order to support old Linux platforms. I use: -DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF Yet Xnnpack still seems to build with +dotprod and +fp16:

In file included from /home/personau/LinuxToolchainsTest/tflite_aarch64_release/xnnpack/src/f16-dwconv2d-chw/gen/5x5s2p2-minmax-neonfp16arith-1x4.c:12:0:
/home/personau/x-tools/aarch64-unknown-linux-gnu-glibc2.25-gcc7.5/lib/gcc/aarch64-unknown-linux-gnu/7.5.0/include/arm_neon.h:17259:1: note: expected 'const float16_t * {aka const __fp16 *}' but argument is of type 'const uint16_t * {aka const short unsigned int *
'
 vld1_dup_f16 (const float16_t* __a)
 ^~~~~~~~~~~~
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/XNNPACK.dir/build.make:4093: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/f16-gemm/gen-inc/1x8inc-minmax-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6137: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
gmake: *** [Makefile:136: all] Error 2

misterBart avatar Mar 12 '24 12:03 misterBart

the build system determines which kernels to build. the macros reflect what was enabled and wont test/use the disabled kernels. with bazel there are flags to control each instruction set:

--define=xnn_enable_arm_fp16_vector=false
--define=xnn_enable_arm_dotprod=false

cmake has options, but I'm not familiar with the usage

XNNPACK_ENABLE_ARM_FP16_VECTOR
XNNPACK_ENABLE_ARM_DOTPROD

On Intel I added some gcc version checking to force the flags off, and that could be done for arm gcc with a change to CMakeLists.txt.. it would be something like:

IF(CMAKE_C_COMPILER_ID STREQUAL "GNU")
  IF(CMAKE_C_COMPILER_VERSION VERSION_LESS "11")
    SET(XNNPACK_ENABLE_ARM_FP16_VECTOR OFF)
    SET(XNNPACK_ENABLE_ARM_DOTPROD OFF)
  ENDIF()
ENDIF()```

fbarchard avatar Mar 12 '24 22:03 fbarchard

cmake has options, but I'm not familiar with the usage

XNNPACK_ENABLE_ARM_FP16_VECTOR
XNNPACK_ENABLE_ARM_DOTPROD

Yes, I already turned these off, see my opening post. The problem is that, even though I set these CMake options to OFF, Xnnpack still builds with +dotprod and +fp16.

misterBart avatar Mar 13 '24 09:03 misterBart

What version of XNNPack are you building? The failing file was removed on Sep 26, 2022

alankelly avatar Mar 27 '24 07:03 alankelly

The version part of TfLite 2.10. (Can I check the specific Xnnpack version in the TfLite source code?) TfLite 2.10.1 was released Nov 16, 2022. Perhaps that TfLite still includes the failing file.

misterBart avatar Mar 27 '24 10:03 misterBart

Can you update to the latest release? We can't fix old releases.

alankelly avatar Mar 27 '24 10:03 alankelly

Still getting the errors with the latest TfLite release (2.16):

cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build.make:173: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:6832: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:40157: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6806: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/all] Error 2
gmake: *** [Makefile:136: all] Error 2

Steps I execute:

git clone --single-branch --branch r2.16 https://github.com/tensorflow/tensorflow tensorflow_src
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchain_aarch64.cmake -DCMAKE_BUILD_TYPE=release -DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF ../tensorflow_src/tensorflow/lite
cmake --build . -j 8 --config release

misterBart avatar Mar 28 '24 08:03 misterBart

Can you try adding -DXNNPACK_ENABLE_ASSEMBLY=OFF?

alankelly avatar Mar 28 '24 12:03 alankelly

After adding that option TfLite 2.16 builds without errors, and I can run a test program on an Arm64 board using TfLite 2.16. But before I cheer too early, the test program runs slower now, which naturally comes from disabling the use of assembly code. -DXNNPACK_ENABLE_ASSEMBLY=OFF is too profound. The Arm64 board does not support float16, etc. but I would still like to use the other assembly micro-kernels in Xnnpack.

misterBart avatar Mar 28 '24 16:03 misterBart

Ok, we know what the problem is now. The solution is to get the update-microkernels script to split the assembly files into ones with and without arm V8 and to create new targets with the appropriate compilation options. Would you like to send a PR to do this?

alankelly avatar Mar 28 '24 16:03 alankelly

A PR suggests I know what to fix in the codebase, which I don't.

misterBart avatar Mar 29 '24 08:03 misterBart