superlu_dist
superlu_dist copied to clipboard
Error during factorization with OpenMP (`-Denable-openmp=ON`)
I am running into problems on an M1 Mac with OpenMP builds and OMP_NUM_THREADS
> 1. The error I receive is
** On entry to DGEMM parameter number 8 had an illegal value
which is being thrown from inside of dlook_ahead_update.c
. When running in a debugger, it looks like I am running into a circumstance where lda < m
for DGEMM, perhaps indicating some level of data corruption. This occurs pretty far into the factorization after a few successful iterations of the outer while
loop. I did try building without #define ISORT
and after fixing some compilation errors I get a segmentation fault independent of the number of threads (this might be expected with the comment here that qsort
has a bug on macOS).
I am able to provide the matrix of interest if required, just let me know. Or perhaps this is a known limitation.
UPDATE 2/29: Testing on a Linux machine for the same problem, I don't get any errors regardless of the number of threads.
How large is your matrix? Can you send it? I can give it a try on Mac.
Here is one example with n = 504
in COO format:
I am using the pdgssvx
driver (via MFEM). This example works on any number of MPI processes with 1 OpenMP thread, but fails for > 1 threads.
Thanks for looking into this @xiaoyeli. Please let me know if you are able to reproduce the issue or if there's any more info from me that would be useful in debugging this. Thank you!