nvidia driver sysext and the nvidia runtime sysext is not compatible
Description
When enable nvidia driver sysext as well as nvidia runtime sysext, nvidia.service failed due to nvidia-smi not found.
Impact
can't have both installed via sysext. the docker container can't access the gpu
Environment and steps to reproduce
- Set-up:
- path: /etc/extensions/nvidia_runtime.raw contents: source: https://github.com/flatcar/sysext-bakery/releases/download/latest/nvidia_runtime-v1.16.2-x86-64.raw verification: hash: sha256-9833c758cd872ec990f02dfa4aafab827d73248092cb86a9926867411b9bd019 mode: 0755
- path: /etc/flatcar/enabled-sysext.conf contents: inline: | nvidia-drivers-570
- Task: install a newer version of nvidia driver (default 535 is too old) and also runtime for container.
- Action(s): boot
- Error: nvidia.service failed. nvidia-smi not found.
Expected behavior
both driver and runtime are installed.
Additional information
NA
Hi @yuhuyoyo,
Thanks for raising this issue. nvidia_runtime-v1.16.2-x86-64.raw is an old release and does not support living with pre-built sysext images. (i.e it does not pull this https://github.com/flatcar/sysext-bakery/pull/153)
I would recommend to update your configuration to this:
---
# config.yaml
# butane < config.yaml > config.json
variant: flatcar
version: 1.0.0
storage:
files:
- path: /etc/flatcar/enabled-sysext.conf
contents:
inline: |
nvidia-drivers-570
- path: /etc/extensions/nvidia-runtime.raw
contents:
source: https://extensions.flatcar.org/extensions/nvidia-runtime-v1.17.5-x86-64.raw
(Note: the source URL is now https://extensions.flatcar.org which is the recommended way to consume artifacts from the sysext-bakery)
Let us know how it goes.
I can confirm the documentation is working for my home lab setup, ESXi with 3 k3s nodes (1 master, and two for worker nodes equipped with 2080TI.)
It dramatically reduced my tiny GPU cluster setup to an hour (including other basic stuff like in-cluster monitoring components deployment).
And just be careful for the kernel version when using alpha release. But it's easy to do rollback with flatcar.