nvidia-docker
nvidia-docker copied to clipboard
Use mps on kubernetes
I'm trying to use mps service on kubernetes with nvidia-docker
Docker version 19.03.13,
nvidia-driver 495.44
cuda 11.5
image :ngc tensorflow:21.11
Now I set nvidia-cuda-mps-control on the host machine,also hostIPC and hostPID has been set when nvidia-docker startup.
Now the process in container can found nvidia-cuda-mps-control process, but the process memory limit is not in effect,No matter what I use
$export CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=”0=1G,1=512MB” or set_default_device_pinned_mem_limit
how can I make MPS work correctly across multiple containers?
We do not officially support MPS in nvidia-docker or kubernetes. Some users have been able to get it to work in the past, but there is no supported way to do it at the moment.
That said, we do plan to add official support for MPS in the next few months, as part of an overall improved "GPU sharing initiative" that will unify the experience for GPU sharing through CUDA multiplexing, MPS, and / or MIG.
You can use this project for now: https://github.com/awslabs/aws-virtual-gpu-device-plugin
I added support for per client memory restrictions in my fork's README. Only works for CUDA >= 11.5 https://github.com/ghokun/aws-virtual-gpu-device-plugin
@klueska that would be great! Is there any ticket or other resource where we can follow roadmap/progress on this "GPU sharing initiative"?
@klueska is there any good news about this project ? I'm looking forward to you ~ 😄