aws-virtual-gpu-device-plugin Pod keeps restarting when two containers share GPU

Pod keeps restarting when two containers share GPU

Open parth-chudasama opened this issue 2 years ago • 0 comments

I am trying to run Nvidia-triton containers for model inferencing, however when more than 1 container is allocated to the same node, one of the container 1) Either fails to load the model onto the GPU. 2) Keep on restarting.

Any suggestions on how this can be solved?

Aug 09 '22 06:08 parth-chudasama

aws-virtual-gpu-device-plugin aws-virtual-gpu-device-plugin copied to clipboard

Pod keeps restarting when two containers share GPU

aws-virtual-gpu-device-plugin
aws-virtual-gpu-device-plugin copied to clipboard