bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Nvidia variants for bare metal

Open wokalski opened this issue 2 years ago • 3 comments

What I'd like: I would like to use bottle rocket on premise on machines with NVIDIA GPUs. Therefore, I'd like a variant containing Nvidia drivers for bare metal.

Any alternatives you've considered: Support or instructions for building the nvidia-driver docker for Bottlerocket.

wokalski avatar Dec 10 '23 15:12 wokalski

Hello @wokalski! Thanks for cutting this issue! Have you looked at how we build our NVIDIA AWS variants? https://github.com/bottlerocket-os/bottlerocket/blob/develop/variants/aws-k8s-1.28-nvidia/Cargo.toml

I haven't tried this since I don't have bare metal NVIDIA hardware to try it on, but building that custom variant may work without much more difficulty. The key piece is to add the packages for NVIDIA:

    "nvidia-container-toolkit",
    "nvidia-k8s-device-plugin",
    "kmod-6.1-nvidia-tesla-535",

along with their dependencies. You may also need to adjust the size of the image https://github.com/bottlerocket-os/bottlerocket/blob/develop/variants/aws-k8s-1.28-nvidia/Cargo.toml#L13 to fit the drivers.

Can you see if this works for you?

yeazelm avatar Dec 11 '23 16:12 yeazelm

I have seen it but I didn't dig deeper so I didn't know if those weren't somehow aws/VM specific.

I'll do my best to check it out and report back!

wokalski avatar Dec 11 '23 19:12 wokalski

I took a bit of time to try this out and can confirm the images build just fine for metal:

$ git diff
diff --git a/variants/metal-k8s-1.28/Cargo.toml b/variants/metal-k8s-1.28/Cargo.toml
index d299e025..b7d50ced 100644
--- a/variants/metal-k8s-1.28/Cargo.toml
+++ b/variants/metal-k8s-1.28/Cargo.toml
@@ -36,6 +36,10 @@ included-packages = [
     "cni",
     "cni-plugins",
     "kubelet-1.28",
+    # nvidia
+    "nvidia-container-toolkit",
+    "nvidia-k8s-device-plugin",
+    "kmod-6.1-nvidia-tesla-535",
 ]

 [lib]
@@ -50,3 +54,7 @@ aws-iam-authenticator = { path = "../../packages/aws-iam-authenticator" }
 cni = { path = "../../packages/cni" }
 cni-plugins = { path = "../../packages/cni-plugins" }
 kubernetes-1_28 = { path = "../../packages/kubernetes-1.28" }
+# nvidia
+nvidia-container-toolkit = { path = "../../packages/nvidia-container-toolkit" }
+nvidia-k8s-device-plugin = { path = "../../packages/nvidia-k8s-device-plugin" }
+kmod-6_1-nvidia = { path = "../../packages/kmod-6.1-nvidia" }

I don't have hardware that can prove this works but the resulting image did have all the drivers and additional toolkits/plugins and they did come up and attempt to find the hardware (for which I don't have).

yeazelm avatar Dec 17 '23 18:12 yeazelm