aws-virtual-gpu-device-plugin icon indicating copy to clipboard operation
aws-virtual-gpu-device-plugin copied to clipboard

Are there any plans to support CUDA_MPS_PINNED_DEVICE_MEM_LIMIT?

Open t-ibayashi-safie opened this issue 2 years ago • 3 comments

Present Status

I understand the current system configuration as follows:

  • Currently, the amount of GPU threads used by Pod seems to be controlled by CUDA_MPS_ACTIVE_THREAD_PERCENTAGE.
  • And the memory usage used by the pod is not limited by this plugin.
  • Therefore, we need to pass the per_process_gpu_memory_fraction argument to tensorflow to control memory usage

My Suggestion

  • A new environment variable CUDA_MPS_PINNED_DEVICE_MEM_LIMIT has been added to cuda 11.5.
  • CUDA_MPS_PINNED_DEVICE_MEM_LIMIT can control the memory usage of each MPS clients.
  • I think it will be more convenient if you adapt this plugin to this new feature.
  • Because:
    • It don't have to rely on Tensorflow.
    • It will be able to use other GPU libraries.

t-ibayashi-safie avatar Mar 24 '22 03:03 t-ibayashi-safie

pluginapi.AllocateRequest does not container information about current pod/container so I think it is not trivial to add CUDA_MPS_PINNED_DEVICE_MEM_LIMIT env variable to this plugin.

Limitations:

  • Custom resources can be integers only.
  • No information / annotation / labels about allocated container/pod inside device plugin. :(

I am working on a fork to support it though (hopefully)

Currently I have this:

  • Added CUDA_MPS_ACTIVE_THREAD_PERCENTAGE on client level so each container can have different amount of SM units.
  • Added CUDA_MPS_PINNED_DEVICE_MEM_LIMIT as env variable to container to limit GPU memory usage.
  • Metrics for each container about memory usage.
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-device-query
spec:
  hostIPC: true
  containers:
    - name: nvidia-device-query
      image: ghcr.io/kuartis/nvidia-device-query:1.0.0
      command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
      env:
        - name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
          value: 0=2G
      resources:
        limits:
          k8s.kuartis.com/vgpu: '1'
      volumeMounts:
        - name: nvidia-mps
          mountPath: /tmp/nvidia-mps
  volumes:
    - name: nvidia-mps
      hostPath:
        path: /tmp/nvidia-mps

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

      resources:
        limits:
          k8s.kuartis.com/vgpu: '1'
          k8s.kuartis.com/vgpu-mem: '1024' # This will set correct env variable for container

Here is the link: https://github.com/kuartis/kuartis-virtual-gpu-device-plugin

ghokun avatar Mar 24 '22 07:03 ghokun

Thank you for providing an answer to my question.

If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

Amazing. I'm looking forward to using this :)

t-ibayashi-safie avatar Mar 25 '22 02:03 t-ibayashi-safie

Thank you for providing an answer to my question.

If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

Amazing. I'm looking forward to using this :)

Yes, It does limit the memory usage of the container. It even OOMs if you give low amounts.

ghokun avatar Mar 25 '22 05:03 ghokun