test_extensions/test_sgemmt.c fails with SME on Apple M4
Building commit d23680b81d5179ce6ae1ca5546303b81646ecac1 with make -j DYNAMIC_ARCH=1 TARGET=VORTEX results in test failures on Apple M4:
TEST 1135/1522 sgemmt:c_api_rowmajor_upper_M_50_K_50_a_notrans_b_notrans [FAIL]
ERR: test_extensions/test_sgemmt.c:797 expected 0.000e+00, got 2.741e-01 (diff -2.741e-01, tol 1.000e-04)
TEST 1139/1522 sgemmt:c_api_rowmajor_upper_alpha_zero [FAIL]
ERR: test_extensions/test_sgemmt.c:888 expected 0.000e+00, got 8.187e-03 (diff -8.187e-03, tol 1.000e-04)
TEST 1140/1522 sgemmt:c_api_rowmajor_upper_beta_one [FAIL]
ERR: test_extensions/test_sgemmt.c:910 expected 0.000e+00, got 3.662e-01 (diff -3.662e-01, tol 1.000e-04)
TEST 1141/1522 sgemmt:c_api_rowmajor_lower_M_50_K_50_a_notrans_b_notrans [FAIL]
ERR: test_extensions/test_sgemmt.c:933 expected 0.000e+00, got 2.784e-01 (diff -2.784e-01, tol 1.000e-04)
TEST 1145/1522 sgemmt:c_api_rowmajor_lower_alpha_zero [FAIL]
ERR: test_extensions/test_sgemmt.c:1024 expected 0.000e+00, got 8.250e-03 (diff -8.250e-03, tol 1.000e-04)
TEST 1146/1522 sgemmt:c_api_rowmajor_lower_beta_one [FAIL]
ERR: test_extensions/test_sgemmt.c:1046 expected 0.000e+00, got 3.539e-01 (diff -3.539e-01, tol 1.000e-04)
These failures only occur with the SME SGEMM direct kernel. I no longer see the test failures if I disable SME support with the following patch:
diff --git a/interface/gemm.c b/interface/gemm.c
index 54e5604fd..cde3038e6 100644
--- a/interface/gemm.c
+++ b/interface/gemm.c
@@ -429,7 +429,7 @@ void CNAME(enum CBLAS_ORDER order, enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANS
#endif
#if defined(ARCH_ARM64) && (defined(USE_SGEMM_KERNEL_DIRECT)||defined(DYNAMIC_ARCH))
#if defined(DYNAMIC_ARCH)
- if (support_sme1())
+ if (false)
#endif
if (beta == 0 && alpha == 1.0 && order == CblasRowMajor && TransA == CblasNoTrans && TransB == CblasNoTrans) {
SGEMM_DIRECT(m, n, k, a, lda, b, ldb, c, ldc);
This was initially reported on the numpy bug tracker.
~Looks like the CBLAS SGEMM got broken recently for (some) row-major inputs( though not directly caused by #5407 ) - #5380 from two weeks ago would seem to be a likely candidate~
Probably caused by my overly optimistic attempt to make the new ARMV9 kernel(s) available on the M4 with its unique combination of SME without (non-streaming) SVE, without also creating a dedicated "M4" target. The hack I committed is definitely too fragile and could lead to an (empty) non-SME stub getting called instead of the "direct" sgemm kernel.
I have separated ARMV9SME and a new VORTEXM4 target in a local fork now, but the main issue probably is/was fairly trivial - the inadvertent use of register x18 in sgemm_direct_sme1_2VLx2VL.S , which happens to be reserved on OSX. I'll try to find time tomorrow to check what a sensible minimal patch should look like - having an SME-only target that is separate from the less restrictive SVE2+SME "ARMV9SME" might be valuable enough in itself, even if a future M5 cpu happened to come with regular SVE support
can of worms ... the actual problem with the sgemmt test appears to have been a few cpu registers that are documented to get trashed by switching in and out of streaming mode, but were neither saved nor marked as overwritten by the SME kernels
~Is there a way to get access to an M4 machine to try to help out with this issue?~
@martin-frbg I have access to an M4 machine. Is there anything I can do to help?
Thanks, I hope I only need a few hours to finalize #5423 (things I had planned to do last weekend...)