aws-virtual-gpu-device-plugin icon indicating copy to clipboard operation
aws-virtual-gpu-device-plugin copied to clipboard

Pod keeps restarting when two containers share GPU

Open parth-chudasama opened this issue 2 years ago • 0 comments

I am trying to run Nvidia-triton containers for model inferencing, however when more than 1 container is allocated to the same node, one of the container 1) Either fails to load the model onto the GPU. 2) Keep on restarting.

Any suggestions on how this can be solved?

parth-chudasama avatar Aug 09 '22 06:08 parth-chudasama