XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

Build fails with strict-aliasing violations

Open eli-schwartz opened this issue 4 months ago • 7 comments

I tried to build with the following *FLAGS to optimize the build: -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing

The -Werror=* flags are important to detect cases where the compiler can try to optimize based on assuming Undefined Behavior (UB) cannot happen, and miscompile code that has UB in it. strict-aliasing issues are always bad but LTO can make them even worse.

I got this error:

[322/356] /usr/bin/ccache /usr/bin/x86_64-pc-linux-gnu-gcc -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=0 -DXNN_ENABLE_ARM_FP16_SCALAR=0 -DXNN_ENABLE_ARM_FP16_VECTOR=0 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=0 -DXNN_ENABLE_ARM_SME=0 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=1 -DXNN_ENABLE_AVX256VNNI=1 -DXNN_ENABLE_AVX256VNNIGFNI=1 -DXNN_ENABLE_AVX512AMX=1 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=1 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=1 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/include -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src -I/include  -pipe -ggdb -march=native -fstack-protector-all -O2 -fdiagnostics-color=always -frecord-gcc-switches -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -Wformat -Werror=format-security -std=c99 -fPIC -Wno-psabi -O2  -mf16c -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512vnni -mgfni -MD -MT CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -MF CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o.d -o CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -c /var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c
FAILED: CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o 
/usr/bin/ccache /usr/bin/x86_64-pc-linux-gnu-gcc -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=0 -DXNN_ENABLE_ARM_FP16_SCALAR=0 -DXNN_ENABLE_ARM_FP16_VECTOR=0 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=0 -DXNN_ENABLE_ARM_SME=0 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=1 -DXNN_ENABLE_AVX256VNNI=1 -DXNN_ENABLE_AVX256VNNIGFNI=1 -DXNN_ENABLE_AVX512AMX=1 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=1 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=1 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/include -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src -I/include  -pipe -ggdb -march=native -fstack-protector-all -O2 -fdiagnostics-color=always -frecord-gcc-switches -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -Wformat -Werror=format-security -std=c99 -fPIC -Wno-psabi -O2  -mf16c -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512vnni -mgfni -MD -MT CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -MF CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o.d -o CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -c /var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c
/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c: In function ‘xnn_qd8_f16_qc4w_gemm_minmax_ukernel_14x8c8__avx256vnnigfni’:
/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c:138:62: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
  138 |   const __m256 voutput_min = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) &params->scalar.min));
      |                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c:139:62: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
  139 |   const __m256 voutput_max = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) &params->scalar.max));
      |                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
ninja: build stopped: cannot make progress due to previous errors.
 * ERROR: sci-ml/XNNPACK-2024.12.03::gentoo failed (compile phase):
 *   ninja -v -k0 failed

... repeated for many files.

Originally reported upstream: https://bugs.gentoo.org/953467 Exhaustive build log: build.log

eli-schwartz avatar Aug 19 '25 05:08 eli-schwartz

Thanks for the report. Is it possible to make a godbolt.org reproducible?

Any suggestion on a fix? The params->scalar.min is fp16 but in avx2 we want to use vpbroadcastw from the memory pointer to broadcast 16 bits to the vector. _mm_set1_ph is avx512, not avx2 and not portable to Visual C.

fbarchard avatar Oct 01 '25 01:10 fbarchard

I've got a reproducible, but its a bit confusing

git clone https://github.com/google/XNNPACK.git
cd XNNPACK
bazel build -c opt bench:all

has many strict-aliasing errors... another others bazel build bench:all (-c fastbuild) builds without warnings. But the benchmarks are slow

bazel build --copt=-O2 bench:all has some build errors, but only a few? eg

bazel build -c fastbuild --copt=-O2 bench:all

In file included from src/f32-rminmax/gen/f32-rmax-scalar-u3-acc3.c:18:
./src/xnnpack/simd/f32-scalar.h: In function 'xnn_cmpeq_f32':
./src/xnnpack/simd/f32-scalar.h:26:31: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
   26 |   const xnn_simd_f32_t var = *(const float *)&_##var##_int_value;
      |                               ^~~~~~~~~~~~~~~~~
./src/xnnpack/simd/f32-scalar.h:162:3: note: in expansion of macro 'XNN_SIMD_CONST_F32_FROM_INT32'
  162 |   XNN_SIMD_CONST_F32_FROM_INT32(ones, INT32_C(0xFFFFFFFF));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./src/xnnpack/simd/f32-scalar.h: In function 'xnn_cmpneq_f32':
./src/xnnpack/simd/f32-scalar.h:26:31: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
   26 |   const xnn_simd_f32_t var = *(const float *)&_##var##_int_value;
      |                               ^~~~~~~~~~~~~~~~~
./src/xnnpack/simd/f32-scalar.h:168:3: note: in expansion of macro 'XNN_SIMD_CONST_F32_FROM_INT32'
  168 |   XNN_SIMD_CONST_F32_FROM_INT32(ones, INT32_C(0xFFFFFFFF));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

I've submitted a few PR to reduce the quantity of errors in common code like this

fbarchard avatar Nov 12 '25 00:11 fbarchard

Thanks for looking into this.

I'm not familiar with bazel, but generally speaking the use of gcc -O2 will cause the compiler to execute the codegen phases that look for strict aliasing issues, and without that, you likely will not see any warnings/errors at all.

If "fastbuild" means build faster by not trying to perform optimizations then that explains why it shows no errors (and also why the benchmarks were slow, of course).

(Note: the GCC sub-flag here is -fstrict-aliasing, enabled by default at -O2 or -O3 and also by -Os.)

eli-schwartz avatar Nov 12 '25 01:11 eli-schwartz

Please don't disable the warning. :(

It is Undefined Behavior -- the compiler can and will generate invalid machine code when the optimizer assumes it is spec-compliant and tries to generate more efficient code which relies on the aliasing rules being followed.

If the issue can't be fixed properly then a suitable bandaid is to build without optimizations. It is most likely that the aliasing issue won't cause any bad codegen without those optimizations in play. So for example you can force -O1 instead of -O2.

eli-schwartz avatar Nov 17 '25 21:11 eli-schwartz

Agreed that it would be better to fix the warning, but

  1. I cant reproduce the issue with clang - it doesnt have the same warning
  2. on linux x86 there is only 1 form of warning - xnn_float16 in params 20 microkernels do this

At startup the params are set once. In the microkernel a pointer to params is passed with min/max and the value is set with vpbroadcastw

      const __m256 vmin = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) &params->scalar.min));
      const __m256 vmax = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) &params->scalar.max));

https://www.felixcloutier.com/x86/vpbroadcast

The field is '_Float16' when the compiler supports it. _mm_set1_ph() would be the logical intrinsic, but its a 'sequence' of instructions according to intel intrinsics and not supported on older compilers and Visual C.

I tried referring to the struct.value field which is used when the compiler does not support _Float16 but I tested with clang and it failed

Since the microkernel only reads the field, its relatively safe the aliasing wont cause an optimization issue. The issue exists for all cpus, but x86 only has a small amount of fp16 (20 files), so I tried to address that.

fbarchard avatar Nov 17 '25 22:11 fbarchard

I cant reproduce the issue with clang - it doesnt have the same warning

clang has weak diagnostic capabilities. In this case, -Wstrict-aliasing is "supported" by clang as a no-op for compatibility only. Using GCC should reproduce the issue, I used GCC 14.3.0 in my testing.

eli-schwartz avatar Nov 17 '25 22:11 eli-schwartz

At the moment the strict aliasing is disabled, and for bazel there are other build errors that prevent the build, so this is lower priority until the build works. But thats just bazel... other build systems, such as ninja, will run into the same issue, so a better fix would be good.

3 ideas

  1. this must be an issue with void* too, for things like malloc, unless void* is an exception. cast to void*, then to uint*?
  2. for simd sometimes the type can be used as is with a different intrinsic, and then cast the entire vector to the final type
  3. in this case, (fp16) there is a 'ph' intrinsic that takes _Float16, exactly as desired. But the type, _Float16, varies by compiler, and the 'intrinisic' is considered an 'instruction sequence', and not implemented at all on some compilers. xnnpack does have a way to deal with broken/missing intrinsics - polyfill.h. Typically this checks for gcc and provides inline assembly. For this one the visual c is problematic, since it is missing the intrinsic, does not have Float16 until C23 and does not have inline. But it could be implemented with the current aliasing method.

gemini suggests a messy solution I'm reluctant to implement. create a union, assign as one type, read back with the other. takes several lines to declare and cast. If this is the solution, option (3) could be the mechanism for delivering it.

fbarchard avatar Nov 21 '25 21:11 fbarchard