gvisor
gvisor copied to clipboard
Add support for nvidia driver version 535.161.08
Description
I want to use gVisor with my A100 GPU's. When I follow the instructions at https://gvisor.dev/docs/user_guide/gpu/ and run:
runsc nvproxy list-supported-drivers
I see:
535.161.07
550.54.14
550.54.15
535.104.12
535.129.03
535.154.05
Unfortunately, 535.161.07 is one off from 535.161.08, the version that my GPU's are currently running. If I run:
sudo runsc --nvproxy --debug --debug-log=/tmp/runsc.log run -bundle / my_container
and cat the error log I see:
I0701 16:17:05.097581 2939814 nvproxy.go:35] NVIDIA driver version: 535.161.08
W0701 16:17:05.097621 2939814 util.go:64] FATAL ERROR: creating loader: registering filesystems: registering nvproxy driver: unsupported Nvidia driver version: 535.161.08
creating loader: registering filesystems: registering nvproxy driver: unsupported Nvidia driver version: 535.161.08
unable to read from the sync descriptor: 0, error EOF
i.e. the .08 vs. 0.07 actually matters.
Presumably the solution is similar to https://github.com/google/gvisor/pull/10181/files
Is this feature related to a specific bug?
No response
Do you have a specific solution in mind?
Presumably the solution is similar to https://github.com/google/gvisor/pull/10181/files
Note that if the two driver versions are ABI-equivalent, you can set the --nvproxy-driver-version flag to the NVIDIA driver version that gVisor does support and it will override this version-detection code.
As per https://gvisor.dev/docs/user_guide/gpu/#driver-versions, our policy is to add support only for driver versions used by COS, which is used in GKE.
We do have support for 535.161.07. Assuming no breaking changes have occurred between 535.161.07 and 535.161.08, you could try setting runsc flag --nvproxy-driver-version=535.161.07 as Etienne suggested.