k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Using CUDA MPS to enable GPU sharing, the pod occupies all GPU memory.

Open ysz-github opened this issue 11 months ago • 12 comments

I have already enabled GPU sharing using CUDA MPS, but when deploying pods with yaml, it occupies all GPU memory. Is my way of requesting GPU resources wrong?

The way I request GPU resources is as follows: resources: limits: nvidia.com.gpu: 1

ysz-github avatar Mar 02 '24 14:03 ysz-github

How did you set up MPS?

klueska avatar Mar 02 '24 14:03 klueska

您是如何设置 MPS 的?

I haven't set MPS in the YAML, just applied for GPU resources like in Time-slicing mode. How should I set it up? Thank you!

ysz-github avatar Mar 02 '24 14:03 ysz-github

How did you set up MPS?

The settings to enable CUDA MPS are as follows:

version: v1
flags:
  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: false
    deviceListStrategy: "envvar"
    deviceIDStrategy: "uuid"
  gfd:
    oneshot: false
    noTimestamp: false
    outputFile: /etc/kubernetes/node-feature-discovery/features.d/gfd
    sleepInterval: 60s
sharing:
  mps:
    resources:
    - name: nvidia.com/gpu
      replicas: 10

ysz-github avatar Mar 02 '24 14:03 ysz-github

@ysz-github do you have an example application / podspec that you're using to confirm this?

Could you also please confirm your driver version? We are investigating an issue where setting the device memory limits by UUID are not having the desired effect.

elezar avatar Mar 04 '24 09:03 elezar

I have same issues using mps in docker cuda process, driver 535.129.03 and nvdp version is 0.15.0-rc1

aphrodite1028 avatar Mar 18 '24 03:03 aphrodite1028

There is a known issue with 0.15.0-rc.1 where memory limits were not correctly applied. This will be addressed in v0.15.0-rc.2 which we will release soon.

elezar avatar Mar 18 '24 07:03 elezar

There is a known issue with 0.15.0-rc.1 where memory limits were not correctly applied. This will be addressed in v0.15.0-rc.2 which we will release soon.

ok, i know, thanks for your reply!

aphrodite1028 avatar Mar 18 '24 08:03 aphrodite1028

@aphrodite1028 @ysz-github we have just released https://github.com/NVIDIA/k8s-device-plugin/releases/tag/v0.15.0-rc.2 which should address this issue. Please let us know if you're still experiencing problems.

elezar avatar Mar 18 '24 11:03 elezar

@aphrodite1028 @ysz-github we have just released https://github.com/NVIDIA/k8s-device-plugin/releases/tag/v0.15.0-rc.2 which should address this issue. Please let us know if you're still experiencing problems.

I found https://github.com/NVIDIA/k8s-device-plugin/blob/main/cmd/mps-control-daemon/mps/daemon.go#L77-L85 here.

if I do not set CUDA_VISIBLE_DEVICES env and start nvidia-cuda-mps-control -d and nvidia-cuda-mps-control, then limit device memory failed and not found nvidia-cuda-mps-server in container。 if I setting again, ignore mps-control-daemon ds config,will success in host machine, but Segmentation fault in container.

how to set device memory limit for client in container?

driver version is 535.129.03 GPU is RTX A6000

and i use helm deploy in k8s has an error like "linux mounts: path /run/nvidia/mps is mounted on /run but it is not a shared mount" when has mountPropagation

        volumeMounts:
        - mountPath: /mps
          mountPropagation: Bidirectional
          name: mps-root

aphrodite1028 avatar Mar 19 '24 03:03 aphrodite1028

@aphrodite1028 . You shouldn't need to do anything special in your user container. The system starts the MPS server for all GPUs on the machine and your client will be forced to make use of it.

These lines set the upper limit on the pinned device memory and thread percentage consumable by the client. https://github.com/NVIDIA/k8s-device-plugin/blob/main/cmd/mps-control-daemon/mps/daemon.go#L111-L122

You can manually adjust the pinned memory limit and thread percentage to something smaller that this using the envvars when you start your container (but you can't set it to something larger).

klueska avatar Mar 26 '24 11:03 klueska

@aphrodite1028 . You shouldn't need to do anything special in your user container. The system starts the MPS server for all GPUs on the machine and your client will be forced to make use of it.

These lines set the upper limit on the pinned device memory and thread percentage consumable by the client. https://github.com/NVIDIA/k8s-device-plugin/blob/main/cmd/mps-control-daemon/mps/daemon.go#L111-L122

You can manually adjust the pinned memory limit and thread percentage to something smaller that this using the envvars when you start your container (but you can't set it to something larger).

thanks for your reply.

if mps pinned device memory has driver version limit when i use? I found using man nvidia-cuda-mps-control in driver 470, not found set_default_device_pinned_mem_limit method.

aphrodite1028 avatar Mar 27 '24 07:03 aphrodite1028