scuda icon indicating copy to clipboard operation
scuda copied to clipboard

Segfault when preloading scuda client libscuda_12.2.so even though the server is running

Open msbroy opened this issue 9 months ago • 5 comments

There is always an Segmentation Fault when preloading the client at ../libscuda_12.2.so whether local or remote, also the SCUDA_SERVER ip is set correctly as you can see.

root@cuda-gpu-worker6-scuda:/home/ubuntu/scuda/deploy# ./start.sh torch ../libscuda_12.2.so Connecting to SCUDA server at: localhost:14833 Using scuda binary at path: ../libscuda_12.2.so Running torch example... ./start.sh: line 23: 348232 Segmentation fault (core dumped) LD_PRELOAD="$libscuda_path" python3 -c "import torch; print('CUDA Available:', torch.cuda.is_available())"

msbroy avatar Apr 01 '25 12:04 msbroy

Seeing the same segfault here. Fresh build, with git clone today, 4/19/2025, using deb 11.10, cmake version 3.18.4, kern 5.10.0-32-amd64 #1 SMP Debian 5.10.223-1 , cuda dev libs 12.6-whatever, etc.

Runtime output:

~/scuda$ LD_PRELOAD=./libscuda_12.6.so strace -f -p nvidia-smi Segmentation fault

Note, we don't even get to a single system call, as the binary just segfaults before anything useful can happen.

Kernel log:

[4237863.147664] libscuda_12.6.s[929557]: segfault at 2 ip 0000000000000002 sp 00007ffd9679f7b8 error 14 in libscuda_12.6.so[7f87d2764000+31000] [4237863.147671] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd8.

Server side seems to run, actually listens:

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:14833 0.0.0.0:* LISTEN 624644/./server_12.

So... that seems fine.

Help?

tkapela avatar Apr 19 '25 20:04 tkapela

Encountered the same problem, has anyone solved this problem?

James-Leong avatar May 06 '25 02:05 James-Leong

Encountered the same problem, has anyone solved this problem?

LW945 avatar May 28 '25 13:05 LW945

Encountered the same problem, has anyone solved this problem?

Reinstalling the system can solve this problem......

LW945 avatar May 29 '25 05:05 LW945

Encountered the same problem, has anyone solved this problem?

liding1992 avatar Jun 25 '25 09:06 liding1992