perftest icon indicating copy to clipboard operation
perftest copied to clipboard

ib_write_bw --cuda will lead to system deallock

Open antonywei opened this issue 4 years ago • 2 comments

client mlx5 nic ./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0 server_ip_address -a

server mlx5 NIC run: ./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0

when pressing ctrl+c to kill the process, the hole system will crash and report system deadlock.

it will not happened if we don't use the param --use_cuda;

antonywei avatar Jan 13 '21 03:01 antonywei

can you copy the crash dump here?

drossetti avatar May 12 '21 17:05 drossetti

It seems the system has crashed before writing the core dump files, maybe the reason is ib_write_bw will not release GPU resources there are some problems (for example RNR error). however, the Cuda and kernel didn't release these resources and lead to the system crash.

antonywei avatar May 16 '21 13:05 antonywei

I tried to reproduce it with loopback, and it didnt reproduce. i pressed the ctrl+c while passing traffic and also when allocating the GPU buffer. can you tell what is the exact time you tried to kill the process?

sshaulnv avatar Nov 10 '22 12:11 sshaulnv

Closing the Issue, Please re-open if reproduce.

HassanKhadour avatar Nov 30 '22 11:11 HassanKhadour