OpenBLAS
OpenBLAS copied to clipboard
Remove unconditional `NO_AVX512=1` for flang builds
I want to figure out what's happening with the errors with flang when using the default build options on a platform that supports AVX512. This problem was already observed in #4016 (CC @mmuetzel), leading to work-arounds like the following https://github.com/OpenMathLib/OpenBLAS/blob/e1eef56e0510fecd5a05df9a8fddaf63a3d91ef0/.github/workflows/dynamic_arch.yml#L177-L179
However, these problems still occur in conda-forge with the in-progress flang 19, almost 3 releases later (c.f. #4768); more precisely, the errors are
The following tests FAILED:
5 - sblas3 (Failed)
8 - dblas3 (Failed)
11 - cblas3 (Failed)
15 - zblas3 (Failed)
which matches what happened in #4016. There are some more detailed failure logs in that PR that I haven't yet tried to reproduce.
Before raising an upstream bug report, I first would like to properly understand what's happening in OpenBLAS itself, because for now I haven't been able to construct the link between NO_AVX512 and any fortran code.
Running on azure pipelines, we're getting skylakex agents regularly, which have some AVX512 instructions and thus fall into the above failures (there are still some non-AVX512 agents around; when I caught one, the tests passed). As hoped, adding NO_AVX512=1 does in fact cause the tests to pass, with the following difference in configuration:
Running getarch
GETARCH results:
-CORE=SKYLAKEX
-LIBCORE=skylakex
+CORE=HASWELL
+LIBCORE=haswell
NUM_CORES=2
HAVE_MMX=1
HAVE_SSE=1
HAVE_SSE2=1
HAVE_SSE3=1
HAVE_SSSE3=1
HAVE_SSE4_1=1
HAVE_SSE4_2=1
HAVE_AVX=1
HAVE_AVX2=1
-HAVE_AVX512VL=1
HAVE_FMA3=1
MAKEFLAGS += -j 2
The macro HAVE_AVX512VL doesn't appear often outside of the config setup, basically the only usage AFAICT is
https://github.com/OpenMathLib/OpenBLAS/blob/e1eef56e0510fecd5a05df9a8fddaf63a3d91ef0/kernel/simd/intrin.h#L59-L61
What I don't understand is how intrin_avx512.h influences any fortran code.