gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

[fix] : avoid setting empty value for migmanager config's name

Open rahulait opened this issue 1 week ago • 3 comments

Description

This PR avoids rendering migManager config's name if it is set to empty. Default value for it based on CRD specification is default-mig-parted-config

Currently, on helm template, it gets rendered as:

$ helm template nvidiagpu  -n gpu-operator --create-namespace nvidia/gpu-operator --version v25.10.1 -f values.yaml | grep -A8 migManager
  migManager:
    enabled: true
    repository: nvcr.io/nvidia/cloud-native
    image: k8s-mig-manager
    version: "v0.13.1"
    imagePullPolicy: IfNotPresent
    config:
      name:                    <---- value is empty here
      default: "all-disabled"

If Argocd is used to install gpu-operator via helm, it sometimes causes sync diffs on subsequent sync runs as the applied value gets updated to default value and the new rendered value is empty/null. Screenshot 2025-12-17 at 3 56 52 PM

This PR avoids setting name if its empty and hence argocd doesn't find the value changed on subsequent syncs as the key is not rendered if its empty.

Checklist

  • [x] No secrets, sensitive information, or unrelated changes
  • [x] Lint checks passing (make lint)
  • [x] Generated assets in-sync (make validate-generated-assets)
  • [x] Go mod artifacts in-sync (make validate-modules)

Testing

  • [ ] Unit tests (make coverage)
  • [x] Manual cluster testing (describe below)
  • [ ] N/A or Other (docs, CI config, etc.)

Test details: Manually tested the change by rendering the updated helm chart. It correctly skips the key. Example:

helm template nvidiagpu  -n gpu-operator --create-namespace deployments/gpu-operator/ -f values.yaml | grep -A8 migManager
  migManager:
    enabled: true
    repository: nvcr.io/nvidia/cloud-native
    image: k8s-mig-manager
    version: "v0.13.1"
    imagePullPolicy: IfNotPresent
    config:
      default: "all-disabled"
    gpuClientsConfig:

rahulait avatar Dec 18 '25 05:12 rahulait

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Dec 18 '25 05:12 copy-pr-bot[bot]

/ok to test 690c9ae89f40bce875e751629a2d74101d6fc482

rahulait avatar Dec 18 '25 05:12 rahulait

How about defaulting to default-mig-parted-config instead? This way, it's the same outcome as the CRD default is default-mig-parted-config. Falling back to the default value is more reader-friendly anyway since most users aren't going to read the CRD spec

tariq1890 avatar Dec 18 '25 18:12 tariq1890