Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

nvidia driver sysext and the nvidia runtime sysext is not compatible

Open yuhuyoyo opened this issue 4 months ago • 2 comments

Description

When enable nvidia driver sysext as well as nvidia runtime sysext, nvidia.service failed due to nvidia-smi not found.

Impact

can't have both installed via sysext. the docker container can't access the gpu

Environment and steps to reproduce

  1. Set-up:
  • path: /etc/extensions/nvidia_runtime.raw contents: source: https://github.com/flatcar/sysext-bakery/releases/download/latest/nvidia_runtime-v1.16.2-x86-64.raw verification: hash: sha256-9833c758cd872ec990f02dfa4aafab827d73248092cb86a9926867411b9bd019 mode: 0755
  • path: /etc/flatcar/enabled-sysext.conf contents: inline: | nvidia-drivers-570
  1. Task: install a newer version of nvidia driver (default 535 is too old) and also runtime for container.
  2. Action(s): boot
  3. Error: nvidia.service failed. nvidia-smi not found.

Expected behavior

both driver and runtime are installed.

Additional information

NA

yuhuyoyo avatar Aug 20 '25 15:08 yuhuyoyo

Hi @yuhuyoyo,

Thanks for raising this issue. nvidia_runtime-v1.16.2-x86-64.raw is an old release and does not support living with pre-built sysext images. (i.e it does not pull this https://github.com/flatcar/sysext-bakery/pull/153)

I would recommend to update your configuration to this:

---
# config.yaml
# butane < config.yaml > config.json
variant: flatcar
version: 1.0.0

storage:
  files:
  - path: /etc/flatcar/enabled-sysext.conf
    contents:
      inline: |
        nvidia-drivers-570
  - path: /etc/extensions/nvidia-runtime.raw
    contents:
      source: https://extensions.flatcar.org/extensions/nvidia-runtime-v1.17.5-x86-64.raw

(Note: the source URL is now https://extensions.flatcar.org which is the recommended way to consume artifacts from the sysext-bakery)

Let us know how it goes.

tormath1 avatar Aug 21 '25 07:08 tormath1

I can confirm the documentation is working for my home lab setup, ESXi with 3 k3s nodes (1 master, and two for worker nodes equipped with 2080TI.)

It dramatically reduced my tiny GPU cluster setup to an hour (including other basic stuff like in-cluster monitoring components deployment).

And just be careful for the kernel version when using alpha release. But it's easy to do rollback with flatcar.

guhuajun avatar Oct 13 '25 05:10 guhuajun