k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

When I want to use MPS in Kubernetes, I need to specify --mps-root.

Open zbk2012 opened this issue 1 year ago • 3 comments

#################### logs: using mps requires --mps-root to be specified. #################### The contents of the nvidia-device-plugin.yml file are as follows:

...
env:
- name: CONFIG_FILE
  value: "/data/system-yaml/a100-mps.yaml"
...

#################### The contents of the /data/system-yaml/a100-mps.yaml file are as follows:

version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 2

#################### I have added the following content to the nvidia-device-plugin.yml file:

...
env:
- name: CONFIG_FILE
  value: "/data/system-yaml/a100-mps.yaml"
- name: MPS_ROOT
  value: "/run/nvidia/mps"
...

The container successfully started, but no GPU was found and there is nothing in the /run/nvidia/mps directory.

How to fill in MPS_ROOT?

zbk2012 avatar Jul 11 '24 14:07 zbk2012

Hi @zbk2012. From your example, it seems as if your config file is not properly indented. You are probably looking for something like instead:

version: v1
sharing:
  mps:
    resources:
    - name: nvidia.com/gpu
      replicas: 2

This should also be confirmed by your device plugin logs.

elezar avatar Jul 17 '24 13:07 elezar

Hi @zbk2012. From your example, it seems as if your config file is not properly indented. You are probably looking for something like instead:

version: v1
sharing:
  mps:
    resources:
    - name: nvidia.com/gpu
       replicas: 2

This should also be confirmed by your device plugin logs.

Oh, I'm sorry, the indentation was missing when copying. The indentation in the config file is correct.

zbk2012 avatar Jul 17 '24 13:07 zbk2012

@zbk2012 could you provide the logs for GFD and the device plugin? For example, I use the following to deploy the plugin:

helm upgrade nvidia -i deployments/helm/nvidia-device-plugin \
    --namespace nvidia-device-plugin \
    --create-namespace \
    --set runtimeClassName=nvidia \
    --set config.name=nvidia-plugin-configs \
    --set nvidiaDriverRoot=/ \
    --set gfd.enabled=true

Where the config is created from:

cat << EOF > dp-mps-config.yaml
version: v1
flags:
  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: false
    deviceListStrategy:
    - envvar
    deviceIDStrategy: uuid
sharing:
  mps:
    renameByDefault: false
    resources:
    - name: nvidia.com/gpu
      replicas: 4
EOF

by running:

kubectl create cm -n nvidia-device-plugin nvidia-plugin-configs \
    --from-file=config=dp-mps-config.yaml

elezar avatar Aug 08 '24 15:08 elezar

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Nov 07 '24 04:11 github-actions[bot]

This issue was automatically closed due to inactivity.

github-actions[bot] avatar Dec 07 '24 04:12 github-actions[bot]