ClusterPolicy fails with "missing required field defaultRuntime" when installed via ArgoCD
Describe the bug hen installing GPU Operator via ArgoCD, the ClusterPolicy creation fails with the following error:
ClusterPolicy.spec.operator missing required field "defaultRuntime"
However, the same chart installs successfully when using helm install directly.
Environment
- GPU Operator Version: [e.g., v25.3.4]
- Kubernetes Version: [e.g., v1.33.5]
- Installation Method: ArgoCD (Helm chart)
- Container Runtime: containerd
Root Cause Analysis
1. Missing Template Rendering
The Helm chart template (templates/clusterpolicy.yaml) does not render the defaultRuntime field from values:
# Current template
spec:
operator:
{{- if .Values.operator.runtimeClass }}
runtimeClass: {{ .Values.operator.runtimeClass }}
{{- end }}
{{- if .Values.operator.defaultGPUMode }}
defaultGPUMode: {{ .Values.operator.defaultGPUMode }}
{{- end }}
# ❌ No defaultRuntime rendering!
Verification:
helm template gpu-operator nvidia/gpu-operator --version v24.9.0 | grep -A 20 "kind: ClusterPolicy"
# Result: No defaultRuntime field in the rendered manifest
2. CRD Schema
The CRD defines defaultRuntime with a default value:
defaultRuntime:
type: string
default: docker
enum:
- docker
- crio
- containerd
And it appears to be required (either explicitly or implicitly through schema validation).
3. Why Helm Install Works But ArgoCD Fails
Helm Direct Install (Client-Side Apply):
- Helm renders manifest without
defaultRuntime - kubectl applies using client-side apply
- API Server performs defaulting before/during required validation
- Default value
dockeris applied automatically - ✅ Success
ArgoCD Install (Server-Side Apply):
- ArgoCD renders manifest without
defaultRuntime - ArgoCD applies using server-side apply (default behavior)
- Server-side apply performs stricter validation
- Required field check happens before defaulting can occur
- ❌ Fails with "missing required field"
This is a known Kubernetes behavior where server-side apply is more strict about required fields than client-side apply.
Steps to Reproduce
- Install ArgoCD in a cluster
- Create an ArgoCD Application:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: gpu-operator
namespace: argocd
spec:
project: default
source:
chart: gpu-operator
repoURL: https://helm.ngc.nvidia.com/nvidia
targetRevision: v24.9.0
helm:
values: |
driver:
enabled: true
destination:
server: https://kubernetes.default.svc
namespace: gpu-operator
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- Sync the application
- Observe the error:
ClusterPolicy.spec.operator missing required field "defaultRuntime"
Current Workaround
Users must explicitly set the value in ArgoCD Application:
helm:
values: |
operator:
defaultRuntime: containerd
However, this doesn't actually work because the template doesn't render it!
Alternative workaround - disable server-side apply:
syncPolicy:
syncOptions:
- ServerSideApply=false
Proposed Solution
Fix 1: Add defaultRuntime to Template (Recommended)
Update templates/clusterpolicy.yaml:
spec:
operator:
{{- if .Values.operator.defaultRuntime }}
defaultRuntime: {{ .Values.operator.defaultRuntime }}
{{- end }}
{{- if .Values.operator.runtimeClass }}
runtimeClass: {{ .Values.operator.runtimeClass }}
{{- end }}
And ensure values.yaml has a default:
operator:
defaultRuntime: docker # or detect from cluster
Fix 2: Remove Required Constraint from CRD
If defaulting should handle this, consider making the field optional in the CRD and relying on the default value.
Fix 3: Add Mutating Webhook
Implement a mutating admission webhook to inject the default value before validation occurs.
Expected Behavior
GPU Operator should install successfully via ArgoCD without requiring users to:
- Explicitly set
defaultRuntimein values (when template doesn't render it) - Disable server-side apply
- Use workarounds
Additional Context
This issue affects all GitOps tools that use server-side apply by default (ArgoCD, Flux, etc.).
The combination of:
- CRD with
required+defaultfields - Helm template not rendering the field
- Server-side apply's strict validation
Creates an incompatibility that only manifests in GitOps scenarios.
Related Issues
- Similar issues have been reported in the Kubernetes community regarding server-side apply strictness with required+default fields
- kubernetes/kubernetes#108008
- kubernetes/kubernetes#99003
Suggested Priority
High - This breaks GPU Operator installation for all ArgoCD/GitOps users, which is a common deployment pattern in production environments.
Thanks @taejune for reporting this. We'll have a look into it.
@taejune Can you share the helm and ArgoCD versions used ?
@taejune we tried reproducing it with v25.10.1 and the example you had shared above but its working fine on our end.
We are also seeing that defaultRuntime is getting set correctly to default value.
$ k get clusterpolicy cluster-policy -o yaml | grep -B1 defaultRuntime
operator:
defaultRuntime: docker
We tested this on argocd version v3.2.1
Can you share more details of your environment where you are hitting this issue?