kubespray Is there a configuration for using nvidia gpu in kubespray?

The nvidia container runtime does not lookes like to be working. It looks like kubespray has a configuration item for nvidia gpu, but is it possible to set something in ansible’s variable?

https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes-apps/container_engine_accelerator/nvidia_gpu/tasks/main.yml

Aug 19 '22 09:08 misupopo

Hi @misupopo Could you explain the issue more?

The nvidia container runtime does not lookes like to be working.

What is the actual error message you are facing?

It looks like kubespray has a configuration item for nvidia gpu, but is it possible to set something in ansible’s variable?

What does something mean? What kind of configuration item do you need to specify?

Sep 06 '22 02:09 oomichi

Hi, This week, we've tested but apparently, nvidia_gpu tasks are outdated. It doesn't work.

Kubespray doesn't change the runtime plugin for docker, containerd, or any other runtime. GPU nodes must use the nvidia runtime plugin. It comes with a nvidia-container-toolkit packet (for docker, it uses nvidia-docker2 packet). The toolkit depends on nvidia-driver.

What kubespray needs to do 1 - Install nvidia-driver 2 - Install nvidia-container-toolkit (for docker, nvidia-docker2) 3 - Deploy nvidia-k8s-device-plugin in the GPU nodes.

However, Nvida has an operator to install the packages, label the nodes and do more. The above solution is simple and low-cost.

We are currently working on this, we will submit a PR for this.

cc\ @Dentrax @developer-guy @necatican

Sep 14 '22 15:09 eminaktas

I'm not sure if Kubespray is the best place to do that, especially if you get into dependencies on the NVIDIA drivers and which drivers and setup should be used depending on whether you do bare metal/VGPU/MIG, but what I have done is install components separately from Kubespray as much as possible.

Installing container toolkit is just a yum repo and some packages: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#step-2-install-nvidia-container-toolkit

The part that needs to integrate with Kubespray is this in your inventory, for containerd config:

is_gpu_node:  # insert True/False or some node-dependent logic 
containerd_runtimes_nvidia:
  - name: nvidia
    type: "io.containerd.runc.v1"
    engine: ""
    root: ""
    options:
      systemdCgroup: "true"
      BinaryName: "\"/usr/bin/nvidia-container-runtime\""

containerd_additional_runtimes: "{{ containerd_runtimes_nvidia if is_gpu_node else [] }}"

# TODO FIXME simplify this after https://github.com/kubernetes-sigs/kubespray/pull/9026
# For nvidia device plugin, see https://github.com/NVIDIA/k8s-device-plugin#configure-containerd
containerd_default_runtime: "{{ 'nvidia' if is_gpu_node else 'runc' }}"

Then you just run kubespray, and then we add the nvidia-device-plugin helm chart (including with GFD enabled) on top.

I would suggest any app installations on top of Kubespray should be done based on https://github.com/kubernetes-sigs/kubespray/pull/8347 via helm charts instead of static YAML manifests.

Sep 26 '22 23:09 rptaylor

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 26 '22 00:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jan 25 '23 01:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Feb 24 '23 01:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 24 '23 01:02 k8s-ci-robot

kubespray kubespray copied to clipboard

Is there a configuration for using nvidia gpu in kubespray?

kubespray
kubespray copied to clipboard