bolt
bolt copied to clipboard
Old AVX test fails due to GPF/sigsegv
Summary
The avx sgemm test fails at n=14 on my system, apparently due to a general protection fault reading unaligned memory.
Problem Information
_mm256_load_ps
raises a general protection fault if an address is passed to it that is not aligned to 32 bytes.
Documentation: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_load_ps&ig_expand=4279
System information
$ cat .git/refs/heads/master
e7726a4c165cc45ac117e9eabd8761013a26640e
$ uname -a
Linux DESKTOP-E3P7TC0 3.10.0-1160.62.1.el7.x86_64 #1 SMP Wed Mar 23 09:04:02 UTC 2022 x86_64 GNU/Linux
$ cat /etc/system-release
Red Hat Enterprise Linux Workstation release 7.9 (Maipo)
Diagnostic Information
$ gdb cpp/bolt-build/bolt
(gdb) run
Starting program: /shared/src/bolt/cpp/bolt-build/bolt
brute force sgemm test, n = 1...
brute force sgemm test, n = 2...
brute force sgemm test, n = 5...
brute force sgemm test, n = 14...
Program received signal SIGSEGV, Segmentation fault.
0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
874 return *(__m256 *)__P;
(gdb) up
#1 (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1,
out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
394 sums[mm] = _mm256_load_ps(out_ptr);
(gdb) p out_ptr
$1 = (float *) 0x886cf8
(gdb) bt
#0 0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
#1 (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1,
out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
#2 0x000000000059ff9e in sgemm_colmajor (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0) at /home/user/src/bolt/cpp/src/utils/avx_utils.cpp:18
#3 0x00000000005ee949 in _test_sgemm_colmajor<-1, -1> (N=14, D=1, M=2, simple_entries=false) at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:54
#4 0x00000000005ec3f5 in ____C_A_T_C_H____T_E_S_T____100 () at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:155
#5 0x00000000005bb56e in Catch::FreeFunctionTestCase::invoke (this=0x86ef90) at /home/user/src/bolt/cpp/test/external/catch.hpp:5507
#6 0x00000000005aa337 in Catch::TestCase::invoke (this=0x889280) at /home/user/src/bolt/cpp/test/external/catch.hpp:6389
#7 0x00000000005b972b in Catch::RunContext::runCurrentTest (this=0x7fffffffd560, redirectedCout="", redirectedCerr="") at /home/user/src/bolt/cpp/test/external/catch.hpp:5131
#8 0x00000000005b8737 in Catch::RunContext::runTest (this=0x7fffffffd560, testCase=...) at /home/user/src/bolt/cpp/test/external/catch.hpp:5001
#9 0x00000000005ba095 in Catch::Runner::runTests (this=0x7fffffffd810) at /home/user/src/bolt/cpp/test/external/catch.hpp:5275
#10 0x00000000005bae1b in Catch::Session::run (this=0x7fffffffdb10) at /home/user/src/bolt/cpp/test/external/catch.hpp:5395
#11 0x00000000005bace8 in Catch::Session::run (this=0x7fffffffdb10, argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/external/catch.hpp:5378
#12 0x00000000005ae8bb in main (argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/main.cpp:22
$ valgrind cpp/bolt-build/bolt
==5087== Memcheck, a memory error detector
==5087== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5087== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright info
==5087== Command: cpp/bolt-build/bolt
==5087==
brute force sgemm test, n = 1...
brute force sgemm test, n = 2...
brute force sgemm test, n = 5...
brute force sgemm test, n = 14...
==5087==
==5087== Process terminating with default action of signal 11 (SIGSEGV)
==5087== General Protection Fault
==5087== at 0x5A1AF9: _mm256_load_ps (avxintrin.h:874)
==5087== by 0x5A1AF9: void (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2>(float const*, float const*, int, int, int, float*, bool, int, int, int, int) (avx_utils.hpp:394)
==5087== by 0x59FF9D: sgemm_colmajor(float const*, float const*, int, int, int, float*) (avx_utils.cpp:18)
==5087== by 0x5EE948: void _test_sgemm_colmajor<-1, -1>(int, int, int, bool) (test_avx_utils.cpp:54)
==5087== by 0x5EC3F4: ____C_A_T_C_H____T_E_S_T____100() (test_avx_utils.cpp:155)
==5087== by 0x5BB56D: Catch::FreeFunctionTestCase::invoke() const (catch.hpp:5507)
==5087== by 0x5AA336: Catch::TestCase::invoke() const (catch.hpp:6389)
==5087== by 0x5B972A: Catch::RunContext::runCurrentTest(std::string&, std::string&) (catch.hpp:5131)
==5087== by 0x5B8736: Catch::RunContext::runTest(Catch::TestCase const&) (catch.hpp:5001)
==5087== by 0x5BA094: Catch::Runner::runTests() (catch.hpp:5275)
==5087== by 0x5BAE1A: Catch::Session::run() (catch.hpp:5395)
==5087== by 0x5BACE7: Catch::Session::run(int, char* const*) (catch.hpp:5378)
==5087== by 0x5AE8BA: main (main.cpp:22)
Work to Resolve
I pursued this just a little bit before changing tasks. Following the control flow, it looked to me like the unaligned memory arose from passing a matrix with a nonaligned stride greater than 8, to sgemm_colmajor
. I've come up with the below so far, but have not yet tested and debugged it and make frequent mistakes so likely something is wrong. The patch is pasted here from a tmux pane and then hand edited, so may need manual application.
diff --git a/cpp/test/test_avx_utils.cpp b/cpp/test/test_avx_utils.cpp
index 4ec4e00..34c318a 100644
--- a/cpp/test/test_avx_utils.cpp
+++ b/cpp/test/test_avx_utils.cpp
@@ -44,6 +44,8 @@ void _test_sgemm_colmajor(int N, int D, int M, bool simple_entries=false) {
B.setRandom();
}
C = (C.array() + -999).matrix(); // value we won't accidentally get
+ int aligned_rows = C.rows() - (C.rows() % (-32 / int(sizeof(float))));
+ C.resize(aligned_rows, C.cols());