When I want to use MPS in Kubernetes, I need to specify --mps-root.
####################
logs:
using mps requires --mps-root to be specified.
####################
The contents of the nvidia-device-plugin.yml file are as follows:
...
env:
- name: CONFIG_FILE
value: "/data/system-yaml/a100-mps.yaml"
...
####################
The contents of the /data/system-yaml/a100-mps.yaml file are as follows:
version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 2
####################
I have added the following content to the nvidia-device-plugin.yml file:
...
env:
- name: CONFIG_FILE
value: "/data/system-yaml/a100-mps.yaml"
- name: MPS_ROOT
value: "/run/nvidia/mps"
...
The container successfully started, but no GPU was found and there is nothing in the /run/nvidia/mps directory.
How to fill in MPS_ROOT?
Hi @zbk2012. From your example, it seems as if your config file is not properly indented. You are probably looking for something like instead:
version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 2
This should also be confirmed by your device plugin logs.
Hi @zbk2012. From your example, it seems as if your config file is not properly indented. You are probably looking for something like instead:
version: v1 sharing: mps: resources: - name: nvidia.com/gpu replicas: 2This should also be confirmed by your device plugin logs.
Oh, I'm sorry, the indentation was missing when copying. The indentation in the config file is correct.
@zbk2012 could you provide the logs for GFD and the device plugin? For example, I use the following to deploy the plugin:
helm upgrade nvidia -i deployments/helm/nvidia-device-plugin \
--namespace nvidia-device-plugin \
--create-namespace \
--set runtimeClassName=nvidia \
--set config.name=nvidia-plugin-configs \
--set nvidiaDriverRoot=/ \
--set gfd.enabled=true
Where the config is created from:
cat << EOF > dp-mps-config.yaml
version: v1
flags:
migStrategy: "none"
failOnInitError: true
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: false
deviceListStrategy:
- envvar
deviceIDStrategy: uuid
sharing:
mps:
renameByDefault: false
resources:
- name: nvidia.com/gpu
replicas: 4
EOF
by running:
kubectl create cm -n nvidia-device-plugin nvidia-plugin-configs \
--from-file=config=dp-mps-config.yaml
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
This issue was automatically closed due to inactivity.