aws-virtual-gpu-device-plugin
aws-virtual-gpu-device-plugin copied to clipboard
Are there any plans to support CUDA_MPS_PINNED_DEVICE_MEM_LIMIT?
Present Status
I understand the current system configuration as follows:
- Currently, the amount of GPU threads used by Pod seems to be controlled by CUDA_MPS_ACTIVE_THREAD_PERCENTAGE.
- And the memory usage used by the pod is not limited by this plugin.
- Therefore, we need to pass the per_process_gpu_memory_fraction argument to tensorflow to control memory usage
My Suggestion
- A new environment variable CUDA_MPS_PINNED_DEVICE_MEM_LIMIT has been added to cuda 11.5.
- CUDA_MPS_PINNED_DEVICE_MEM_LIMIT can control the memory usage of each MPS clients.
- I think it will be more convenient if you adapt this plugin to this new feature.
- Because:
- It don't have to rely on Tensorflow.
- It will be able to use other GPU libraries.
pluginapi.AllocateRequest
does not container information about current pod/container so I think it is not trivial to add CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
env variable to this plugin.
Limitations:
- Custom resources can be integers only.
- No information / annotation / labels about allocated container/pod inside device plugin. :(
I am working on a fork to support it though (hopefully)
Currently I have this:
- Added
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
on client level so each container can have different amount of SM units. - Added
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
as env variable to container to limit GPU memory usage. - Metrics for each container about memory usage.
apiVersion: v1
kind: Pod
metadata:
name: nvidia-device-query
spec:
hostIPC: true
containers:
- name: nvidia-device-query
image: ghcr.io/kuartis/nvidia-device-query:1.0.0
command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
env:
- name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
value: 0=2G
resources:
limits:
k8s.kuartis.com/vgpu: '1'
volumeMounts:
- name: nvidia-mps
mountPath: /tmp/nvidia-mps
volumes:
- name: nvidia-mps
hostPath:
path: /tmp/nvidia-mps
What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.
resources:
limits:
k8s.kuartis.com/vgpu: '1'
k8s.kuartis.com/vgpu-mem: '1024' # This will set correct env variable for container
Here is the link: https://github.com/kuartis/kuartis-virtual-gpu-device-plugin
Thank you for providing an answer to my question.
If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?
What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.
Amazing. I'm looking forward to using this :)
Thank you for providing an answer to my question.
If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?
What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.
Amazing. I'm looking forward to using this :)
Yes, It does limit the memory usage of the container. It even OOMs if you give low amounts.