netlib
netlib copied to clipboard
segfault on 2.2.0 in dgetrf on ubuntu x86_64
import dev.ludovic.netlib.LAPACK;
import org.netlib.util.intW;
class Main {
public static void main(String[] args) {
double[] arr = new double[400];
int[] piv = new int[20];
intW info = new intW(0);
LAPACK.getInstance().dgetrf(20, 20, arr, 20, piv, info);
}
}
reproduces in OpenJDK 64-Bit Server VM, Java 1.8.0_292 and OpenJDK 64-Bit Server VM, Java 16.0.1
There aren't any debug symbols and I'm no expert on assembly, but this is what I'm getting. the first instruction is the segfault.
0x7fffd85a262dmov (%rsi),%eax
--
0x7fffd85a262flea 0x30(%rbp),%rsi
0x7fffd85a2633mov $0x10000,%eax
0x7fffd85a2638and 0x4(%rsi),%eax
0x7fffd85a263bcmp $0x10000,%eax
0x7fffd85a2641jne 0x7fffd85a26ce
0x7fffd85a2647mov $0xe0,%eax
0x7fffd85a264cand 0x100(%rbp),%eax
0x7fffd85a2652cmp $0xe0,%eax
0x7fffd85a2658jne 0x7fffd85a26ce
0x7fffd85a265elea 0x10(%rbp),%rsi
0x7fffd85a2662mov (%rsi),%eax
0x7fffd85a2664cmp $0x50654,%eax
0x7fffd85a266aje 0x7fffd85a26ce
0x7fffd85a2670lea 0x188(%rbp),%rsi
0x7fffd85a2677vmovdqu32 %zmm0,(%rsi)
0x7fffd85a267dvmovdqu32 %zmm7,0x40(%rsi)
0x7fffd85a2684vmovdqu32 %zmm8,0x80(%rsi)
0x7fffd85a268bvmovdqu32 %zmm31,0xc0(%rsi)
also reproduces in version 2.0.0
fwiw, i get a segfault for any dimension >= 18, but not before
Also get a failure in sgetrf for dim >= 10
Hi @dlwh, let me try to reproduce that locally, it's first time I see it.
I can reproduce with OpenBLAS, but not with Intel MKL. I also can only reproduce if OPENBLAS_NUM_THREADS is greater than 1. I'm now looking at how dgetrf_parallel (the function in which the SIGSEGV is triggered) is invoked, and why it triggers anything.
Here is what I'm observing. When calling dgetrf_ from Java, we have $rsp = 0x7ffff599d0e8. It then calls dgetrf_parallel a first time, which allocates arrays on the stack and changes $rsp = 0x7ffff5918ef8 (aka 541,168 bytes). It then calls dgetrf_parallel recursively a second time, which allocates arrays on the stack again and changes $rsp = 0x7ffff5894d80 (aka another 541,048 bytes). It then SIGSEGV when trying to store variables on the stack [1].
When accessing the current thread's stack size and stack base, we can clearly see that this is indeed a stack overflow:
(gdb) p (Thread::_thr_current)->_stack_size
$3 = 1052672
(gdb) p (Thread::_thr_current)->_stack_base
$4 = (address) 0x7ffff59a5000 "\177ELF\002\001\001\003"
(0x7ffff59a5000 - 1052672 = 0x7ffff58a4000, which is smaller than $rsp = 0x7ffff5894d80 on the last call to dgetrf_parallel)
Now, onto figuring out why dgetrf_parallel allocates so much stack on the stack, and whether it's reproducible with calls to liblapack.so straight from C.
Also, when setting -Xss10M (set the stack size to 10 MB), I can't reproduce the issue.
[1]
0x00007fff2a145040 <+0>: lea 0x8(%rsp),%r10
0x00007fff2a145045 <+5>: and $0xffffffffffffff80,%rsp
0x00007fff2a145049 <+9>: mov %rdi,%rax
0x00007fff2a14504c <+12>: mov %rdx,%rsi
0x00007fff2a14504f <+15>: pushq -0x8(%r10)
0x00007fff2a145053 <+19>: push %rbp
0x00007fff2a145054 <+20>: mov %rsp,%rbp // $rbp = $rsp
0x00007fff2a145057 <+23>: push %r15
0x00007fff2a145059 <+25>: push %r14
0x00007fff2a14505b <+27>: push %r13
0x00007fff2a14505d <+29>: push %r12
0x00007fff2a14505f <+31>: push %r10
0x00007fff2a145061 <+33>: push %rbx
0x00007fff2a145062 <+34>: sub $0x840c0,%rsp // allocate stack frame of 0x840c0 = 540,864 bytes
=> 0x00007fff2a145069 <+41>: mov %rdi,-0x83fd0(%rbp) // $rbp[-0x83fd0] = $rdi // stack grows down so access with negative index is normal
@dlwh this issue is a repeat of a previously encountered issue with Breeze and netlib-java (so prior to my change). I opened an issue on OpenBLAS.
In the meantime, the workarounds are the following:
- Increase the size of the stack of Java threads with
-Xss10M(set the Java threads' stack size to 10 Mbytes) - Make sure OpenBLAS doesn't use the parallel implementation by defining the environment variable
OPENBLAS_NUM_THREADS=1 - Compile a custom version of OpenBLAS that unconditionally define
USE_ALLOC_HEAPat https://github.com/xianyi/OpenBLAS/blob/develop/lapack/getrf/getrf_parallel.c#L49
I'm exploring the licensing implication of packaging a custom OpenBLAS in the library to avoid having to install it locally, similarly to numpy. That might be also be a longer term solution for this specific issue.
Huh ok. Thanks! netlib-java stopped working on ubuntu 20.04 since they stopped shipping gfortran3 and I didn't think to try
On Thu, May 13, 2021 at 2:52 PM Ludovic Henry @.***> wrote:
@dlwh https://github.com/dlwh this issue is a repeat of a previously encountered issue with Breeze and netlib-java (so prior to my change). I opened an issue on OpenBLAS.
In the meantime, the workarounds are the following:
- Increase the size of the stack of Java threads with -Xss10M (set the Java threads' stack size to 10 Mbytes)
- Make sure OpenBLAS doesn't use the parallel implementation by defining the environment variable OPENBLAS_NUM_THREADS=1
- Compile a custom version of OpenBLAS that unconditionally define USE_ALLOC_HEAP at https://github.com/xianyi/OpenBLAS/blob/develop/lapack/getrf/getrf_parallel.c#L49
I'm exploring the licensing implication of packaging a custom OpenBLAS in the library to avoid having to install it locally, similarly to numpy. That might be also be a longer term solution for this specific issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/luhenry/netlib/issues/2#issuecomment-840856190, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACLIN74T4SCB2ZK6EOS53TNRCZHANCNFSM44WXNMGQ .