[Bug]: SIGSEGV in using dnn_face_recognition_ex on AMD Zen5 Arch 9950X + Ubuntu24
What Operating System(s) are you seeing this problem on?
Linux (x86-64)
dlib version
19.24
Python version
3.12
Compiler
gcc 12
Expected Behavior
Operating System: Ubuntu 24.04 LTS CPU: AMD 9950x Zen5
- Tried with ubuntu official distribution with OpenBLAS OpenMP, Pthread.
- Tried with local compiling OpenBLAS
- make DYNAMIC_ARCH=1 TARGET=ZEN USE_OPENMP=0 NO_AFFINITY=1
- make DYNAMIC_ARCH=1 TARGET=HASWELL USE_OPENMP=0 NO_AFFINITY=1
Expected: Compiling and execution successful.
Current Behavior
Crash stack is here:
(gdb) bt
#0 0x00007fffe6842d64 in sgemm_beta_COOPERLAKE () at /lib/x86_64-linux-gnu/libopenblas.so.0
#1 0x00007fffe46c34c5 in ??? () at /lib/x86_64-linux-gnu/libopenblas.so.0
#2 0x00007fffe484b23d in ??? () at /lib/x86_64-linux-gnu/libopenblas.so.0
#3 0x00007fffe484b498 in ??? () at /lib/x86_64-linux-gnu/libopenblas.so.0
#4 0x00007fffdf69caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#5 0x00007fffdf729c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
With attempts with all configurations, crash is similar.
Steps to Reproduce
[Reproduce] https://dlib.net/dnn_face_recognition_ex.cpp.html
Anything else?
Observed the issue when using dlib + cuda. There is no issue when using dlib without cuda enabled.
No response
Btw, I this seems relevant to how dlib uses openblas, as folks from openblas verified there was no issue in their local test on the same environment: https://github.com/OpenMathLib/OpenBLAS/issues/5243#issuecomment-2823673449
Dlib is just calling blas functions. It's not doing anything open blas specific. And given that it's all worked for over a decade with many blas libraries I'm doubtful it's somehow dlib. They are also just function calls. There isn't any special magic or anything.
So maybe your install of openblas is built wrong? I can't say.
Hi Davis, I also run into similar issue on ArmV8
Thread 3 "sentry_robot" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffff99fcf840 (LWP 7071)]
0x0000ffffe5da9ac0 in sgemm_beta_ARMV8 () from /lib/aarch64-linux-gnu/libopenblas.so.0
(gdb) bt
#0 0x0000ffffe5da9ac0 in sgemm_beta_ARMV8 () at /lib/aarch64-linux-gnu/libopenblas.so.0
#1 0x0000ffffe5cc136c in () at /lib/aarch64-linux-gnu/libopenblas.so.0
I've verified all combination of OpenBLAS:
- OpenBLAS binaries from Ubuntu Official on branch 0.3.29 (Ubuntu 24.1)
- OpenBLAS binaries from Ubuntu Official on branch 0.3.26(Ubuntu 24 LTS default version).
- OpenBLAS source compiling
And different threading solutions:
- OpenBLAS with OpenMP
- OpenBLAS with pthread
I also tested on DLIB + Cuda solution on Armv8 (Jetson Orin AGX 64GB), it seems having the similar issue:
Thread 3 "sentry_robot" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffff99fcf840 (LWP 7071)]
0x0000ffffe5da9ac0 in sgemm_beta_ARMV8 () from /lib/aarch64-linux-gnu/libopenblas.so.0
(gdb) bt
#0 0x0000ffffe5da9ac0 in sgemm_beta_ARMV8 () at /lib/aarch64-linux-gnu/libopenblas.so.0
#1 0x0000ffffe5cc136c in () at /lib/aarch64-linux-gnu/libopenblas.so.0
The reason I suspect that it might be relevant to dlib since the if I run the same product logic with dlib + nocuda, it works fine, but with cuda enabled, it crashes.
Warning: this issue has been inactive for 35 days and will be automatically closed on 2025-06-11 if there is no further activity.
If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.
Warning: this issue has been inactive for 42 days and will be automatically closed on 2025-06-11 if there is no further activity.
If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.
Notice: this issue has been closed because it has been inactive for 45 days. You may reopen this issue if it has been closed in error.