Segfault when preloading scuda client libscuda_12.2.so even though the server is running
There is always an Segmentation Fault when preloading the client at ../libscuda_12.2.so whether local or remote, also the SCUDA_SERVER ip is set correctly as you can see.
root@cuda-gpu-worker6-scuda:/home/ubuntu/scuda/deploy# ./start.sh torch ../libscuda_12.2.so Connecting to SCUDA server at: localhost:14833 Using scuda binary at path: ../libscuda_12.2.so Running torch example... ./start.sh: line 23: 348232 Segmentation fault (core dumped) LD_PRELOAD="$libscuda_path" python3 -c "import torch; print('CUDA Available:', torch.cuda.is_available())"
Seeing the same segfault here. Fresh build, with git clone today, 4/19/2025, using deb 11.10, cmake version 3.18.4, kern 5.10.0-32-amd64 #1 SMP Debian 5.10.223-1 , cuda dev libs 12.6-whatever, etc.
Runtime output:
~/scuda$ LD_PRELOAD=./libscuda_12.6.so strace -f -p nvidia-smi Segmentation fault
Note, we don't even get to a single system call, as the binary just segfaults before anything useful can happen.
Kernel log:
[4237863.147664] libscuda_12.6.s[929557]: segfault at 2 ip 0000000000000002 sp 00007ffd9679f7b8 error 14 in libscuda_12.6.so[7f87d2764000+31000] [4237863.147671] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd8.
Server side seems to run, actually listens:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:14833 0.0.0.0:* LISTEN 624644/./server_12.
So... that seems fine.
Help?
Encountered the same problem, has anyone solved this problem?
Encountered the same problem, has anyone solved this problem?
Encountered the same problem, has anyone solved this problem?
Reinstalling the system can solve this problem......
Encountered the same problem, has anyone solved this problem?