OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Test failure with Skylake CPU, potentially related to avx512 instructions

Open quantumsteve opened this issue 3 years ago • 12 comments

I have a test failure (dblas3) on the current develop branch. Here is the test output, DBLAT3.SUMM.

I build OpenBLAS with cmake, setting CMAKE_BUILD_TYPE=Release and running ctest. The machine has Intel Xeon Silver 4116 CPUs and is running Ubuntu 18.04 with GCC 7.5.0.

This test passes if I set NO_AVX512=1

quantumsteve avatar Dec 22 '21 19:12 quantumsteve

Please share OpenBLAS version and the last number from CPU trade name (it is all in /proc/cpuinfo) Worth trying git checkout (the "Download ZIP" button on top right in github project page)

brada4 avatar Dec 22 '21 23:12 brada4

I checked out the develop branch SHA = 253670383f15cd4d25c6be6b210dfa48a6dcc883

Here's the info from /proc/cpuinfo for processor 0 ( 48 total)

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
stepping	: 4
microcode	: 0x2006b06
cpu MHz		: 2100.000
cache size	: 16896 KB
physical id	: 0
siblings	: 24
core id		: 0
cpu cores	: 12
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4200.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

quantumsteve avatar Dec 23 '21 03:12 quantumsteve

What about building with GNU make (whether this error still occur)(cd to OpenBLAS directory and type "make" directly, blas tests will be done automatically after the build)?

wjc404 avatar Dec 23 '21 03:12 wjc404

There is something strange - I did not see failures in my local builds and the latest CI jobs do not show a problem, but looking back I see occasional test failures (in the Azure job that uses Intel SDE) after commits that did not touch any x86_64 code at all. Could be there is something using uninitialized memory that just happens to be zeroed most of the time :(

martin-frbg avatar Dec 23 '21 05:12 martin-frbg

btw that CI job uses plain make for the build (and gcc 7.5 on ubuntu 18)

martin-frbg avatar Dec 23 '21 06:12 martin-frbg

Failure of dblas3 test is reproducible on my i7-9800x when building with cmake using gcc-7 and gfortran-7 (ubuntu-16.04), willl look into it deeply this weekend.

wjc404 avatar Dec 23 '21 15:12 wjc404

@wjc404 first to check should be AVX512 xsave block early in boot log....

brada4 avatar Dec 23 '21 16:12 brada4

openblas-cmake-dgemm-dtrmm-include It seems that something is wrong with cmake variables. Possibly the cmake variable DYNAMIC_ARCH is defined(T or F) prior to the parsing of kernel/CMakeLists.txt however the C macro DYNAMIC_ARCH is not defined when reading param.h.

wjc404 avatar Dec 25 '21 02:12 wjc404

Update: this change can solve the problem (all tests passed in ctest) fix_cmake

wjc404 avatar Dec 25 '21 03:12 wjc404

Thanks. That would certainly explain why my recent workaround - to fall back to the earlier 4x8 kernel in DYNAMIC_ARCH builds - did not work with CMAKE. Unfortunately it still seems likely that there is something amiss in the 16x2 kernel. (Unless @quantumsteve built with -DDYNAMIC_ARCH as well)

martin-frbg avatar Dec 25 '21 12:12 martin-frbg

@martin-frbg I did not build with -DDYNAMIC_ARCH.

quantumsteve avatar Dec 25 '21 19:12 quantumsteve

Playing with SDE, it seems it could be the dgemm_tcopy_16_skylakex.c that is problematic, not the 16x2 DGEMM/DTRMM kernel itself. Unfortunately I am currently away from my SkylakeX and it appears to have lost its wireless connection to the office router. If you are willing to experiment, you can try changing the DGEMMITCOPY line in kernel/x86_64/KERNEL.SKYLAKEX to point to "../generic/gemm_tcopy_16.c" rather than "dgemm_tcopy_16_skylakex.c" and do a complete rebuild

martin-frbg avatar Dec 25 '21 21:12 martin-frbg

Believed to have been fixed by the 0.3.21 rewrite of the compiler capability test and subsequent logic in the getarch tool, plus the removal of my somewhat misguided workarounds for the originally observed symptoms - 0.3.19 from just around the time of the report would have been broken in this regard.

martin-frbg avatar Aug 14 '22 14:08 martin-frbg