superlu_dist icon indicating copy to clipboard operation
superlu_dist copied to clipboard

Error during factorization with OpenMP (`-Denable-openmp=ON`)

Open sebastiangrimberg opened this issue 11 months ago • 3 comments

I am running into problems on an M1 Mac with OpenMP builds and OMP_NUM_THREADS > 1. The error I receive is

 ** On entry to DGEMM  parameter number  8 had an illegal value

which is being thrown from inside of dlook_ahead_update.c. When running in a debugger, it looks like I am running into a circumstance where lda < m for DGEMM, perhaps indicating some level of data corruption. This occurs pretty far into the factorization after a few successful iterations of the outer while loop. I did try building without #define ISORT and after fixing some compilation errors I get a segmentation fault independent of the number of threads (this might be expected with the comment here that qsort has a bug on macOS).

I am able to provide the matrix of interest if required, just let me know. Or perhaps this is a known limitation.

UPDATE 2/29: Testing on a Linux machine for the same problem, I don't get any errors regardless of the number of threads.

sebastiangrimberg avatar Feb 29 '24 00:02 sebastiangrimberg

How large is your matrix? Can you send it? I can give it a try on Mac.

xiaoyeli avatar Feb 29 '24 23:02 xiaoyeli

Here is one example with n = 504 in COO format:

mat_port_0.txt

I am using the pdgssvx driver (via MFEM). This example works on any number of MPI processes with 1 OpenMP thread, but fails for > 1 threads.

sebastiangrimberg avatar Feb 29 '24 23:02 sebastiangrimberg

Thanks for looking into this @xiaoyeli. Please let me know if you are able to reproduce the issue or if there's any more info from me that would be useful in debugging this. Thank you!

sebastiangrimberg avatar Mar 01 '24 19:03 sebastiangrimberg