py-R-FCN-multiGPU
py-R-FCN-multiGPU copied to clipboard
GPU memory not release after interrupted the training script
Hi, @bharatsingh430 , I faced a problem that the GPU memory not released normally after I interrupted the training script, in details saying, I used 2 GPU, like [0,1], while I pressed the Ctrl+C to stop the training script, then I prompt the nvidia-smi to see the GPU usage, found that only GPU 1 was normally released the used memory and GPU 0 still keep the allocated memory, even wait for a long time, the problem still there, so want to ask which reasons may caused such problem? And how could I fixed it? PS: I tried kill the Python process, but it not work. Waiting for your help! Thank you very much!
you can use ps aux | grep caffe, get the pids and use kill -9 pid. that should work.
@bharatsingh430 It's not always work, at this time, the processes has been killed by following your instructions, however, the occupied GPU memory still there and GPU usage was 100%, so how to clear the occupied GPU memory? Thank you!
I encounter the same question,the process become a zumbia process which parent process is "init".So I have to reboot my machine.