k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Change GFD repository image V0.15.0 Helm

Open YFrendo opened this issue 1 year ago • 8 comments

I don't see any option in the Helm chart to change the repository of the GFD image.

It can be useful in a company network where the cluster don't have any access to the internet.

YFrendo avatar May 28 '24 16:05 YFrendo

We do have https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/helm/nvidia-device-plugin/values.yaml#L48 , isn't this what you look for?

ArangoGutierrez avatar May 28 '24 16:05 ArangoGutierrez

Yeah it change the k8s-device-plugin repository but not the GFD repository, the deployement try to pull the GFD pod from the basic repository.

YFrendo avatar May 28 '24 17:05 YFrendo

they share the same image (https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/container/Dockerfile.ubuntu#L72) , and we have https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/helm/nvidia-device-plugin/templates/daemonset-gfd.yml#L134

ArangoGutierrez avatar May 28 '24 17:05 ArangoGutierrez

Thanks ! I will try this tomorow but I think we can close this issue!

YFrendo avatar May 28 '24 17:05 YFrendo

I'll close it once it works for you :) , not before

ArangoGutierrez avatar May 28 '24 17:05 ArangoGutierrez

Actually i'm working with @YFrendo to deploy this plugin on a brand new airgaped k8s GPU infrastructure and i did override this setting to our mirrored image hub which worked for the k8s-device-plugin. This seems to be working.

The issue relate indeed on the node discovery feature which is deployed from a separated helm chart located here : https://github.com/NVIDIA/k8s-device-plugin/tree/main/deployments/helm/nvidia-device-plugin/charts When i try to enable the GFD from the k8s-plugin helm chart (here => https://github.com/NVIDIA/k8s-device-plugin/blob/925be6d97361359803eb6502d15fa3e69dbe6e2b/deployments/helm/nvidia-device-plugin/values.yaml#L106C3-L106C17), the created pods are trying to pull an image from registry.k8s.io/nfd/node-feature-discovery:v0.15.3 or something like that ; and so far i didn't manage to override that location for the image. Nor from the original chart of the k8s-plugin, nor even when i use the separate helm chart and override it with the appropriated value (inside this https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/helm/nvidia-device-plugin/charts/node-feature-discovery-chart-0.15.3.tgz there is the value template)

Archimonde666 avatar May 28 '24 17:05 Archimonde666

@YFrendo if your are able to install NFD separately, you could pass --set nfd.enabled when installing the device plugin and /or gfd. This should disable the internal nfd dependency.

elezar avatar May 28 '24 18:05 elezar

@YFrendo if your are able to install NFD separately, you could pass --set nfd.enabled when installing the device plugin and /or gfd. This should disable the internal nfd dependency.

This is the solution, in order to get it work in a restrictive environnement you have to first install NFD separately.

Everything work for us now!

But maybe it should be more explicit in the documentation (or add an nfd.image in the chart) Also nfd.enabled can be add in the helm chart exemple !

https://github.com/NVIDIA/k8s-device-plugin/blob/v0.15.0/deployments/helm/nvidia-device-plugin/values.yaml

Thanks for your support !

YFrendo avatar Jun 01 '24 17:06 YFrendo

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Aug 31 '24 04:08 github-actions[bot]

This issue was automatically closed due to inactivity.

github-actions[bot] avatar Sep 30 '24 04:09 github-actions[bot]