usePrecompiled and new versions
Hi
I'm running a cluster with GPU Operator configured to use pre compiled drivers on ubuntu 22.04 nodes It works most of the time, but sometimes, the image pulled to install the driver does not exist (yet)
I bumped once again into this problem today. Yesterday, I installed it multiple times on my dev cluster without any issue Today, installation fails as the image it's trying to use (535-5.15.0-118-generic-ubuntu22.04) does not exist. But image version 535-5.15.0-117-generic-ubuntu22.04 does exist and was pushed a few days ago on nvidia's repository
Would it be possible for the GPU Operator to stick to the previous image as long as the new one does not exist on the repository ?
thanks
@easyrider14 the precompiled image tag that gets used solely depends on what kernel version is installed on your GPU node. I would recommend first verifying the an image tag exists at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/driver/tags for your kernel before upgrading to it. Note, there is a delay between when new Ubuntu kernels are released and when precompiled driver packages are available in the Ubuntu package repositories.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.
This issue has been addressed in https://github.com/NVIDIA/gpu-operator/issues/910#issuecomment-2302811974 and there has been no further updates to this issue.
To keep the issue tracker clean and focused on current and actionable topics, I am going to close this issue.
Feel free to create a new issue if any further queries are there.