nvidia-docker Use mps on kubernetes

I'm trying to use mps service on kubernetes with nvidia-docker

Docker version 19.03.13,
nvidia-driver  495.44
cuda 11.5
image ：ngc tensorflow:21.11

Now I set nvidia-cuda-mps-control on the host machine，also hostIPC and hostPID has been set when nvidia-docker startup.

Now the process in container can found nvidia-cuda-mps-control process, but the process memory limit is not in effect，No matter what I use

$export CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=”0=1G,1=512MB” or set_default_device_pinned_mem_limit

how can I make MPS work correctly across multiple containers?

Dec 08 '21 02:12 somelaoda

We do not officially support MPS in nvidia-docker or kubernetes. Some users have been able to get it to work in the past, but there is no supported way to do it at the moment.

That said, we do plan to add official support for MPS in the next few months, as part of an overall improved "GPU sharing initiative" that will unify the experience for GPU sharing through CUDA multiplexing, MPS, and / or MIG.

Dec 09 '21 09:12 klueska

You can use this project for now: https://github.com/awslabs/aws-virtual-gpu-device-plugin

I added support for per client memory restrictions in my fork's README. Only works for CUDA >= 11.5 https://github.com/ghokun/aws-virtual-gpu-device-plugin

Jan 16 '22 23:01 ghokun

@klueska that would be great! Is there any ticket or other resource where we can follow roadmap/progress on this "GPU sharing initiative"?

Feb 05 '22 18:02 flixr

@klueska is there any good news about this project ？ I'm looking forward to you ～ 😄

Aug 08 '22 14:08 somelaoda

nvidia-docker nvidia-docker copied to clipboard

Use mps on kubernetes

nvidia-docker
nvidia-docker copied to clipboard