perftest
perftest copied to clipboard
ib_write_bw --cuda will lead to system deallock
client mlx5 nic ./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0 server_ip_address -a
server mlx5 NIC run: ./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0
when pressing ctrl+c to kill the process, the hole system will crash and report system deadlock.
it will not happened if we don't use the param --use_cuda;
can you copy the crash dump here?
It seems the system has crashed before writing the core dump files, maybe the reason is ib_write_bw will not release GPU resources there are some problems (for example RNR error). however, the Cuda and kernel didn't release these resources and lead to the system crash.
I tried to reproduce it with loopback, and it didnt reproduce. i pressed the ctrl+c while passing traffic and also when allocating the GPU buffer. can you tell what is the exact time you tried to kill the process?
Closing the Issue, Please re-open if reproduce.