pkg/kube: remove managed fields before SSA
Server-side apply patches cannot contain managed field data; we must zero-out this field before sending the patch.
$ helm status --namespace hypershift hypershift --show-desc
NAME: hypershift
LAST DEPLOYED: Tue Oct 7 08:28:25 2025
NAMESPACE: hypershift
STATUS: failed
REVISION: 1
DESCRIPTION: Release "hypershift" failed: metadata.managedFields must be nil && metadata.managedFields must be nil && metadata.managedFields must be nil && metadata.managedFields must be nil && metadata.managedFields must be nil
TEST SUITE: None
On install (my logger, Helm code):
[08:09:10.093] DEBUG: Error patching resource {
"error": "metadata.managedFields must be nil",
"gvk": "apiextensions.k8s.io/v1, Kind=CustomResourceDefinition",
"name": "routes.route.openshift.io",
"namespace": ""
}
Should we add a non regression test?
@robertsirc sure thing! Do y'all already have an end-to-end setup somewhere? We need a k8s API server running to reproduce the issue, I don't think there's any unit or integration level test that will catch this.
Is this because the chart is trying to set managedFields directly? (or can you please describe how to reproduce so I can understand this is occurring please)
@gjenkins8 no, this is just on a Helm upgrade using v4 + SSA on a chart created by v3 install/upgrade, naively I would expect any codepath that triggers the patching flow from v4 to hit this, since the managed fields are provided always.
@robertsirc sure thing! Do y'all already have an end-to-end setup somewhere? We need a k8s API server running to reproduce the issue, I don't think there's any unit or integration level test that will catch this.
If I may, we can mock this up in this TestPatchResourceServerSide test and test the negative there?
I think you would be much better off testing the positive "real server accepts this SSA request" since there are a lot of rough edges - testing individual rough edges in a unit test by checking for the negative is going to give you poor coverage until you accumulate enough of them for it to be useful, and some of the field manager conflict bits will be impossible to test in that manner.
@gjenkins8 no, this is just on a Helm upgrade using v4 + SSA on a chart created by v3 install/upgrade, naively I would expect any codepath that triggers the patching flow from v4 to hit this, since the managed fields are provided always.
In this case, it is expected that CSA managed fields are removed by:
# https://github.com/helm/helm/blob/main/pkg/kube/client.go#L738-L739
if updateOptions.upgradeClientSideFieldManager {
patched, err := upgradeClientSideFieldManager(original, updateOptions.dryRun, updateOptions.fieldValidationDirective)
...
# https://github.com/helm/helm/blob/main/pkg/kube/client.go#L1042-L1045
patchData, err := csaupgrade.UpgradeManagedFieldsPatch(
info.Object,
sets.New(fieldManagerName),
fieldManagerName)
Can you describe fully how to reproduce an issue please! A naive helm3 -> helm4 upgrade works for me.
Hmm, sorry - looks like this was more likely adoption of existing CRDs or something. Notably, whatever weird state I ended up in was due to the other bug I found (failure to load old releases due to a bug in time encoding, recently fixed), so it's not clear if an end user would end up in this state. Here's a trace I found in my scrollback from after I fixed the managed fields issue:
[08:56:52.469] DEBUG: preparing upgrade {
"name": "hypershift"
}
[08:56:52.469] DEBUG: getting last revision {
"name": "hypershift"
}
[08:56:52.469] DEBUG: getting release history {
"name": "hypershift"
}
[08:56:52.670] DEBUG: getting deployed releases {
"name": "hypershift"
}
[08:56:52.914] DEBUG: number of dependencies in the chart {
"dependencies": 0
}
metadata: map[APIVersion:v2 Annotations:map[] AppVersion: Condition: Dependencies:[] Deprecated:false Description:A Helm chart to install the Hypershift Operator and deps for ARO Home: Icon: Keywords:[] KubeVersion: Maintainers:[] Name:aro-hcp-hypershift-operator Sources:[] Tags: Type:application Version:0.1.0]
[08:56:52.915] DEBUG: determined release apply method {
"previous_release_apply_method": "ssa",
"server_side_apply": true
}
[08:56:53.002] DEBUG: performing update {
"name": "hypershift"
}
[08:56:53.109] DEBUG: creating upgraded release {
"name": "hypershift"
}
[08:56:53.109] DEBUG: creating release {
"key": "sh.helm.release.v1.hypershift.v2"
}
[08:56:54.048] DEBUG: using server-side apply for resource update {
"dryRun": false,
"fieldValidationDirective": "Strict",
"forceConflicts": true,
"upgradeClientSideFieldManager": false
}
[08:56:54.048] DEBUG: checking resources for changes {
"resources": 3
}
[08:56:55.559] DEBUG: Patched resource {
"gvk": "/v1, Kind=ServiceAccount",
"name": "hypershift-installer",
"namespace": "hypershift"
}
[08:56:55.671] DEBUG: Patched resource {
"gvk": "/v1, Kind=Secret",
"name": "pull-secret",
"namespace": "hypershift"
}
[08:56:58.995] DEBUG: Patched resource {
"gvk": "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding",
"name": "hypershift-installer-cluster-role",
"namespace": ""
}
[08:56:58.995] DEBUG: deleting resource {
"kind": "CustomResourceDefinition",
"name": "podmonitors.monitoring.coreos.com",
"namespace": ""
}
[08:57:00.145] DEBUG: deleting resource {
"kind": "CustomResourceDefinition",
"name": "prometheusrules.monitoring.coreos.com",
"namespace": ""
}
[08:57:00.217] DEBUG: deleting resource {
"kind": "CustomResourceDefinition",
"name": "routes.route.openshift.io",
"namespace": ""
}
[08:57:00.299] DEBUG: deleting resource {
"kind": "CustomResourceDefinition",
"name": "servicemonitors.monitoring.coreos.com",
"namespace": ""
}
[08:57:00.392] DEBUG: waiting for resources {
"count": 3,
"timeout": 300000000000
}
I'll go out on a limb and say that deleting CRDs as part of an upgrade likely shows we're well outside of normal execution flow here ... since I don't have a nice reproducer we might as well close this out.
We've reproduced the CRD deletion ....
We are deploying this chart. Running install the first time while opting into CRDs here, then running an upgrade while omitting CRDs here is sufficient to reproduce the issue.
Looks like at least this one was my error: https://github.com/helm/helm/pull/31413
We are deploying this chart. Running install the first time while opting into CRDs here, then running an upgrade while omitting CRDs here is sufficient to reproduce the issue.
ah, thanks!
I wonder if this is related to SSA? The above seems like Helm is (incorrectly) deleting CRDs if "SkipCRDs" is set. Seemingly because they are being omitted from the list of resources to be "updated". And the update reconciliation object is translating that into a delete operation: https://github.com/helm/helm/blob/main/pkg/kube/client.go#L555
I wonder if this is related to SSA?
Could you confirm this is unrelated to SSA please! (And so my understanding is correct; I want/need to fix any SSA issues)
@gjenkins8 that particular CRD issue was from my use of IncludeCRDs.... extremely misleading name, sent another PR for that. I will get you a reproducer log trace for this particular issue (invalid managed fields) shortly
@gjenkins8 that particular CRD issue was from my use of
IncludeCRDs.... extremely misleading name, sent another PR for that. I will get you a reproducer log trace for this particular issue (invalid managed fields) shortly
Thanks! Please do. I want/need to get fixes in before next week. But unsure if there is an issue here.