Build fails with strict-aliasing violations
I tried to build with the following *FLAGS to optimize the build: -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing
The -Werror=* flags are important to detect cases where the compiler can try to optimize based on assuming Undefined Behavior (UB) cannot happen, and miscompile code that has UB in it. strict-aliasing issues are always bad but LTO can make them even worse.
I got this error:
[322/356] /usr/bin/ccache /usr/bin/x86_64-pc-linux-gnu-gcc -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=0 -DXNN_ENABLE_ARM_FP16_SCALAR=0 -DXNN_ENABLE_ARM_FP16_VECTOR=0 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=0 -DXNN_ENABLE_ARM_SME=0 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=1 -DXNN_ENABLE_AVX256VNNI=1 -DXNN_ENABLE_AVX256VNNIGFNI=1 -DXNN_ENABLE_AVX512AMX=1 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=1 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=1 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/include -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src -I/include -pipe -ggdb -march=native -fstack-protector-all -O2 -fdiagnostics-color=always -frecord-gcc-switches -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -Wformat -Werror=format-security -std=c99 -fPIC -Wno-psabi -O2 -mf16c -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512vnni -mgfni -MD -MT CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -MF CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o.d -o CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -c /var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c
FAILED: CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o
/usr/bin/ccache /usr/bin/x86_64-pc-linux-gnu-gcc -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=0 -DXNN_ENABLE_ARM_FP16_SCALAR=0 -DXNN_ENABLE_ARM_FP16_VECTOR=0 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=0 -DXNN_ENABLE_ARM_SME=0 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=1 -DXNN_ENABLE_AVX256VNNI=1 -DXNN_ENABLE_AVX256VNNIGFNI=1 -DXNN_ENABLE_AVX512AMX=1 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=1 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=1 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/include -I/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src -I/include -pipe -ggdb -march=native -fstack-protector-all -O2 -fdiagnostics-color=always -frecord-gcc-switches -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -Wformat -Werror=format-security -std=c99 -fPIC -Wno-psabi -O2 -mf16c -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512vnni -mgfni -MD -MT CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -MF CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o.d -o CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c.o -c /var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c
/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c: In function ‘xnn_qd8_f16_qc4w_gemm_minmax_ukernel_14x8c8__avx256vnnigfni’:
/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c:138:62: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
138 | const __m256 voutput_min = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) ¶ms->scalar.min));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/tmp/portage/sci-ml/XNNPACK-2024.12.03/work/XNNPACK-51a0103656eff6fc9bfd39a4597923c4b542c883/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-14x8c8-minmax-avx256vnnigfni.c:139:62: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
139 | const __m256 voutput_max = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) ¶ms->scalar.max));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
ninja: build stopped: cannot make progress due to previous errors.
* ERROR: sci-ml/XNNPACK-2024.12.03::gentoo failed (compile phase):
* ninja -v -k0 failed
... repeated for many files.
Originally reported upstream: https://bugs.gentoo.org/953467 Exhaustive build log: build.log
Thanks for the report. Is it possible to make a godbolt.org reproducible?
Any suggestion on a fix? The params->scalar.min is fp16 but in avx2 we want to use vpbroadcastw from the memory pointer to broadcast 16 bits to the vector. _mm_set1_ph is avx512, not avx2 and not portable to Visual C.
I've got a reproducible, but its a bit confusing
git clone https://github.com/google/XNNPACK.git
cd XNNPACK
bazel build -c opt bench:all
has many strict-aliasing errors... another others
bazel build bench:all
(-c fastbuild) builds without warnings. But the benchmarks are slow
bazel build --copt=-O2 bench:all
has some build errors, but only a few?
eg
bazel build -c fastbuild --copt=-O2 bench:all
In file included from src/f32-rminmax/gen/f32-rmax-scalar-u3-acc3.c:18:
./src/xnnpack/simd/f32-scalar.h: In function 'xnn_cmpeq_f32':
./src/xnnpack/simd/f32-scalar.h:26:31: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
26 | const xnn_simd_f32_t var = *(const float *)&_##var##_int_value;
| ^~~~~~~~~~~~~~~~~
./src/xnnpack/simd/f32-scalar.h:162:3: note: in expansion of macro 'XNN_SIMD_CONST_F32_FROM_INT32'
162 | XNN_SIMD_CONST_F32_FROM_INT32(ones, INT32_C(0xFFFFFFFF));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./src/xnnpack/simd/f32-scalar.h: In function 'xnn_cmpneq_f32':
./src/xnnpack/simd/f32-scalar.h:26:31: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
26 | const xnn_simd_f32_t var = *(const float *)&_##var##_int_value;
| ^~~~~~~~~~~~~~~~~
./src/xnnpack/simd/f32-scalar.h:168:3: note: in expansion of macro 'XNN_SIMD_CONST_F32_FROM_INT32'
168 | XNN_SIMD_CONST_F32_FROM_INT32(ones, INT32_C(0xFFFFFFFF));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
I've submitted a few PR to reduce the quantity of errors in common code like this
Thanks for looking into this.
I'm not familiar with bazel, but generally speaking the use of gcc -O2 will cause the compiler to execute the codegen phases that look for strict aliasing issues, and without that, you likely will not see any warnings/errors at all.
If "fastbuild" means build faster by not trying to perform optimizations then that explains why it shows no errors (and also why the benchmarks were slow, of course).
(Note: the GCC sub-flag here is -fstrict-aliasing, enabled by default at -O2 or -O3 and also by -Os.)
Please don't disable the warning. :(
It is Undefined Behavior -- the compiler can and will generate invalid machine code when the optimizer assumes it is spec-compliant and tries to generate more efficient code which relies on the aliasing rules being followed.
If the issue can't be fixed properly then a suitable bandaid is to build without optimizations. It is most likely that the aliasing issue won't cause any bad codegen without those optimizations in play. So for example you can force -O1 instead of -O2.
Agreed that it would be better to fix the warning, but
- I cant reproduce the issue with clang - it doesnt have the same warning
- on linux x86 there is only 1 form of warning - xnn_float16 in params 20 microkernels do this
At startup the params are set once. In the microkernel a pointer to params is passed with min/max and the value is set with vpbroadcastw
const __m256 vmin = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) ¶ms->scalar.min));
const __m256 vmax = _mm256_cvtph_ps(_mm_set1_epi16(*(const uint16_t*) ¶ms->scalar.max));
https://www.felixcloutier.com/x86/vpbroadcast
The field is '_Float16' when the compiler supports it. _mm_set1_ph() would be the logical intrinsic, but its a 'sequence' of instructions according to intel intrinsics and not supported on older compilers and Visual C.
I tried referring to the struct.value field which is used when the compiler does not support _Float16 but I tested with clang and it failed
Since the microkernel only reads the field, its relatively safe the aliasing wont cause an optimization issue. The issue exists for all cpus, but x86 only has a small amount of fp16 (20 files), so I tried to address that.
I cant reproduce the issue with clang - it doesnt have the same warning
clang has weak diagnostic capabilities. In this case, -Wstrict-aliasing is "supported" by clang as a no-op for compatibility only. Using GCC should reproduce the issue, I used GCC 14.3.0 in my testing.
At the moment the strict aliasing is disabled, and for bazel there are other build errors that prevent the build, so this is lower priority until the build works. But thats just bazel... other build systems, such as ninja, will run into the same issue, so a better fix would be good.
3 ideas
- this must be an issue with void* too, for things like malloc, unless void* is an exception. cast to void*, then to uint*?
- for simd sometimes the type can be used as is with a different intrinsic, and then cast the entire vector to the final type
- in this case, (fp16) there is a 'ph' intrinsic that takes _Float16, exactly as desired. But the type, _Float16, varies by compiler, and the 'intrinisic' is considered an 'instruction sequence', and not implemented at all on some compilers. xnnpack does have a way to deal with broken/missing intrinsics - polyfill.h. Typically this checks for gcc and provides inline assembly. For this one the visual c is problematic, since it is missing the intrinsic, does not have Float16 until C23 and does not have inline. But it could be implemented with the current aliasing method.
gemini suggests a messy solution I'm reluctant to implement. create a union, assign as one type, read back with the other. takes several lines to declare and cast. If this is the solution, option (3) could be the mechanism for delivering it.