gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

Driver upgrade documentation is misleading when on OpenShift

Open empovit opened this issue 1 week ago • 0 comments

Problem

The GPU Driver Upgrades documentation states:

Upgrade the driver by changing the driver.version value in the cluster policy

This works on Kubernetes (Helm) but fails on OpenShift (OLM).

Behavior

On OpenShift, change only driver.version:

spec:
  driver:
    version: "570.172.08"

The above configuration results in:

  • Invalid driver image path: /:570.172.08-rhel9.6
  • Image pull fails
  • Driver pods fail to start

Required workaround:

Provide values for all driver image properties:

spec:
  driver:
    repository: nvcr.io/nvidia
    image: driver
    version: "570.172.08"

Differences

  • Helm: Populates default repository and image values from chart into ClusterPolicy
  • OLM: ClusterPolicy has no defaults; operator relies on static CSV environment variables

Request

I need your input before suggesting a solution. It looks like either:

  1. The code must be fixed: Provide default values for OLM deployments to match Helm behavior
  2. The docs must be fixed: Document that on OpenShift all three fields (repository, image, version) are required

Environment

  • OpenShift with GPU Operator installed via OLM

empovit avatar Dec 18 '25 11:12 empovit