flexiblas icon indicating copy to clipboard operation
flexiblas copied to clipboard

segmentation fault with numpy on POWER9 (only) when using FlexiBLAS

Open boegel opened this issue 4 years ago • 13 comments

I'm seeing a Segmentation fault when running the numpy 1.20.3 tests when using FlexiBLAS 3.0.4 with OpenBLAS 0.3.15, but not when linking to OpenBLAS 0.3.15 directly, which tells me FlexiBLAS is somehow causing the segmentation fault...

I'm not seeing this problem on Intel (Haswell, Skylake X), AMD (Rome), or Arm (AWS Graviton2).

Here's a partial backtrace I obtained when running the numpy tests via gdb:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff4887530 in dnrm2_k () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
Missing separate debuginfos, use: yum debuginfo-install libxcrypt-4.1.1-4.el8.ppc64le
(gdb) bt
#0  0x00007ffff4887530 in dnrm2_k () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#1  0x00007ffff453d788 in dnrm2_ () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#2  0x00007ffff62cfd9c in dnrm2_ () from /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3
#3  0x00007ffff4d7816c in dgeev_ () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#4  0x00007ffff639e8e4 in dgeev_ () from /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3
#5  0x00007fff7364b334 in call_dgeev (params=0x7ffffffe63b0) at numpy/linalg/umath_linalg.c.src:2292
#6  DOUBLE_eig_wrapper (JOBVL=JOBVL@entry=78 'N', JOBVR=JOBVR@entry=86 'V', args=0x7fff50dad120, dimensions=<optimized out>, steps=<optimized out>) at numpy/linalg/umath_linalg.c.src:2292
#7  0x00007fff7364c02c in DOUBLE_eig (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDfunc=<optimized out>) at numpy/linalg/umath_linalg.c.src:2336
#8  0x00007ffff6a5d294 in PyUFunc_GeneralizedFunction (op=0x7ffffffe8200, kwds=0x0, args=0x7fff50dad0f0, ufunc=0x0) at numpy/core/src/umath/ufunc_object.c:2986
#9  PyUFunc_GenericFunction_int (ufunc=<optimized out>, ufunc@entry=0x7fff736c1130, args=args@entry=0x7fff50f88820, kwds=kwds@entry=0x7fff50e79c00, op=op@entry=0x7ffffffe8200)
    at numpy/core/src/umath/ufunc_object.c:3119
#10 0x00007ffff6a5f740 in ufunc_generic_call (ufunc=0x7fff736c1130, args=0x7fff50f88820, kwds=0x7fff50e79c00) at numpy/core/src/umath/ufunc_object.c:4747
...

This only happens when numpy is linked with FlexiBLAS:

$ ldd $(python -c "import numpy; print(numpy.core._multiarray_umath.__file__)") | grep blas
	libflexiblas.so.3 => /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3 (0x0000200000570000)

Any ideas on what may be causing this segmentation fault?

I tried using ulimit -s unlimited (default is 8192 on that system), no change.

After export FLEXIBLAS=netlib to make FlexiBLAS use the fallback netlib backend, the segmentation fault doesn't happen either...

boegel avatar May 24 '21 15:05 boegel

Can you provide the backtrace with debug information? How does it look like in valgrind?

grisuthedragon avatar May 24 '21 21:05 grisuthedragon

Backtrace with debug info:

#0  dnrm2_k (n=2, x=<optimized out>, inc_x=1) at ../kernel/power/../arm/nrm2.c:69
#1  0x00007ffff453d788 in dnrm2_ (N=<optimized out>, x=<optimized out>, INCX=<optimized out>) at nrm2.c:61
#2  0x00007ffff62cf9fc in dnrm2_ (n=<optimized out>, x=<optimized out>, incx=<optimized out>) at /tmp/centos/FlexiBLAS/3.0.4/GCC-10.3.0/flexiblas-3.0.4/src/wrapper_blas_gnu.c:2899
#3  0x00007ffff4d788ec in dgeev (jobvl=..., jobvr=..., n=2, a=..., lda=<optimized out>, wr=..., wi=..., vl=..., ldvl=2, vr=..., ldvr=2, work=..., lwork=260, info=<optimized out>, _jobvl=140737323525740, _jobvr=8) at dgeev.f:490
#4  0x00007ffff639e594 in dgeev_ (jobvl=0x7ffffffe655c "NV", jobvr=0x7ffffffe655d "V", n=0x7ffffffe6548, a=0x7fff650fc3a0, lda=0x7ffffffe654c, wr=0x7fff650fc3c0, wi=0x7fff650fc3d0, vl=0x7fff650fc3e0, ldvl=0x7ffffffe6550, vr=0x7fff650fc3e0, ldvr=0x7ffffffe6554,
    work=0x7fff650458d0, lwork=0x7ffffffe6558, info=0x7ffffffe6560) at /tmp/centos/FlexiBLAS/3.0.4/GCC-10.3.0/flexiblas-3.0.4/src/lapack_interface/wrapper/dgeev.c:80
#5  0x00007fff7364b334 in call_dgeev (params=0x7ffffffe6500) at numpy/linalg/umath_linalg.c.src:2292
#6  DOUBLE_eig_wrapper (JOBVL=JOBVL@entry=78 'N', JOBVR=JOBVR@entry=86 'V', args=0x7fff5142d4a0, dimensions=<optimized out>, steps=<optimized out>) at numpy/linalg/umath_linalg.c.src:2292
#7  0x00007fff7364c02c in DOUBLE_eig (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDfunc=<optimized out>) at numpy/linalg/umath_linalg.c.src:2336
#8  0x00007ffff6a5d294 in PyUFunc_GeneralizedFunction (op=0x7ffffffe8270, kwds=0x0, args=0x7fff5142d470, ufunc=0x0) at numpy/core/src/umath/ufunc_object.c:2986
#9  PyUFunc_GenericFunction_int (ufunc=<optimized out>, ufunc@entry=0x7fff736c1130, args=args@entry=0x7fff5005aca0, kwds=kwds@entry=0x7fff50e7a700, op=op@entry=0x7ffffffe8270) at numpy/core/src/umath/ufunc_object.c:3119
#10 0x00007ffff6a5f740 in ufunc_generic_call (ufunc=0x7fff736c1130, args=0x7fff5005aca0, kwds=0x7fff50e7a700) at numpy/core/src/umath/ufunc_object.c:4747
...

I'll look into valgrind too.

boegel avatar May 25 '21 09:05 boegel

@grisuthedragon No segmentation fault when running via Valgrind it seems (though a bunch of unrelated "Invalid read of size 4" cases in Python itself are reported). So that's a dead end I think, I'm afraid...

boegel avatar May 25 '21 09:05 boegel

That's weird. I try to compile FB + Numpy on my power system asap.

grisuthedragon avatar May 25 '21 09:05 grisuthedragon

To quickly trigger the segfault, you can use python -c "import numpy as np; np.linalg.test()".

boegel avatar May 25 '21 09:05 boegel

I tried this too on a real ppc machine and the minimal reproducer for "issues" I got is python -c "import numpy as np; np.linalg.test(verbose=3, extra_argv=['-k', 'TestEigvals and test_sq_cases'])" which either segfaults with a double free or fails the test (works with OpenBLAS directly)

I also see messages in stderr:

 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DORGHR parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DORGHR parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DORGHR parameter number  8 had an illegal value
 ** On entry to ZGEHRD parameter number  5 had an illegal value
 ** On entry to ZHSEQR parameter number  7 had an illegal value

Those are from the numpy xerblas error handler and I guess those are a good hint on to the real problem

Flamefire avatar May 26 '21 12:05 Flamefire

More minimal reproducer: python -c "from numpy import array, linalg; linalg.eigvals(array([[1., 2.], [3., 4.]]))"

I suspect a stackoverflow due to GCC misoptimizing OpenBLAS which becomes apparent by FlexiBLAS as FlexiBLAS uses a the stack to save a register which gets overwritten by the bug. I reported this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

Flamefire avatar May 27 '21 11:05 Flamefire

@Flamefire Thanks for the work and identifying, where this behaviour comes from. Lets wait until the gcc guys react and see how they see this problem.

grisuthedragon avatar May 27 '21 12:05 grisuthedragon

The IBM compiler guys are looking into this. It seems to be indeed a compiler issue since GCC 7. So I'd say this can be closed as there is nothing short of providing a better error message that can be done here

Flamefire avatar Jun 03 '21 08:06 Flamefire

@Flamefire Any updates on this?

boegel avatar Jan 09 '22 11:01 boegel

Small update here from our side: we've side-stepped this problem by compiling OpenBLAS with -fstack-protector-strong on POWER, see https://github.com/easybuilders/easybuild-easyconfigs/pull/15885 for more information

boegel avatar Oct 12 '22 09:10 boegel

The GCC developers determined this a bug in the usage related to the Fortran calling convention:

As described in (https://gcc.gnu.org/onlinedocs/gfortran/Argument-passing-conventions.html), since the first parameter to DGEBAL is of type CHARACTER, there is an extra hidden argument. Change the call to DGEBAL from dgebal (the flexiBLAS wrapper routine) to take an extra argument. This causes the compiler to allocate a parameter save area in dgebal's frame, as there are now 9 parameters but only 8 parameter registers.

Flamefire avatar Jun 20 '23 07:06 Flamefire

@Flamefire I know about this extra arguments, but due to compatibility reasons in the early times of FlexiBLAS, we neglected them. Even using CBLAS/LAPACKE from the reference implementation can lead to this issue, since they "forget" about these additional parameters as well.

For FlexiBLAS I will do some tests and, if successful, integrate it in the next release.

grisuthedragon avatar Jun 20 '23 13:06 grisuthedragon