ApplicationSet resources experience data corruption
Checklist:
- [x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- [x] I've included steps to reproduce the bug.
- [ ] I've pasted the output of
argocd version.
Describe the bug
ApplicationSet resources experience data corruption where:
- syncPolicy becomes an empty object:
syncPolicy: {} generatorcontains misplacedtemplateobjects withmetadataandspecfields (that look like they should be at the root level)
Example (excerpt):
spec:
syncPolicy: {} # Should contain preserveResourcesOnDeletion: false
generators:
- pullRequest:
# Normal pullRequest config
template: # This should NOT be here - belongs at root level
metadata: {...}
spec: {...}
This seems to happen randomly. ApplicationSets look correct after initial deployment, then get corrupted later without apparent cause. It does not happen on all (349) ApplicationSets, but only a subset (~60). We were not able to identify a pattern. These ~60 affected ApplicationSets are some of our "preview" environments, of which there are 74, so some remain unaffected.
Affected ApplicationSets Pattern
60+ ApplicationSets are affected, all are using:
pullRequestgenerators (both standalone and in matrix combinations)- Various generator combinations:
- Direct
pullRequestgenerators matrixwithpullRequest+gitmatrixwithlist+pullRequest
- Direct
- All use
goTemplate: true - All have
preserveResourcesOnDeletion: false
But: there are other ApplicationSets that use these combinations and are not affected.
Inspecting the affected ApplicationSet resources in cluster, specifically the managedFields section, we could see that at least the generators field is managed by the application-set-controller:
- apiVersion: argoproj.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:generators: {}
f:template:
f:spec:
f:source:
f:directory:
f:jsonnet: {}
manager: argocd-applicationset-controller
operation: Update
time: "2025-09-03T15:54:28Z"
We have been debugging this issue for several days now, including extensive vibe-coding sessions to identify patterns, but still have no idea what might be causing this.
To Reproduce
- ApplicationSets deploy successfully initially
- Corruption occurs spontaneously (during controller reconciliation cycles?)
- No user action triggers the corruption, it seems
- Pattern affects only some ApplicationSets, not all
Version
We're not using argocd CLI since we don't usually have direct access to ArgoCD.
Version running is v3.0.11+240a183, deployed to Kubernetes using the community helm Chart.
Logs
No relevant logs on the ApplicationSet controller..
@dsiebel Thank you for reporting this.
What you're experiencing looks rather strange, several questions regarding your configuration:
Do you mean that your initial manifests do not have
spec.generators[0].pullRequest.template set, and then you see
it having been filled with spec and metadata from the root level of the same ApplicationSet?
Or some other data?
It would be great if you could post some complete examples of the manifests before and after corruption. It would help getting an idea what is happening. Please don't forget to edit out sensitive information.
Some other questions:
- Did you start experiencing the issue after an upgrade of ArgoCD version?
- Are you managing your ApplicationSets with ArgoCD, helm, anything?
- Have you considered using audit logs to understand what is changing the ApplicationSet manifests?
- Does the corruption happen when ApplicationSet controller is disabled?
@dudinea Thanks for getting back to me!
Here are some example ApplicationSets (shortened for readability) that are affected by this:
Raw YAML
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: <redacted>
namespace: argocd
spec:
goTemplate: true
syncPolicy:
preserveResourcesOnDeletion: false
generators:
- pullRequest:
github:
# GitHub PR config with appSecretName and labels
template:
metadata:
name: <redacted>-{{.number}}
spec:
project: <redacted>
source:
directory:
include: '{*.yml,*.yaml}'
repoURL: <redacted>
targetRevision: prod # using branch tracking
path: manifests/<redacted>/{{.number}}
destination:
name: <redacted>
syncPolicy:
automated:
prune: true
selfHeal: false
allowEmpty: true
In-cluster manifest (after corrution)
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: <redacted>
namespace: argocd
spec:
generators:
- pullRequest:
github:
# GitHub PR config with appSecretName and labels
template: # <-- this shouldn't be here!
metadata: {}
spec:
destination: {}
project: ""
goTemplate: true
syncPolicy: {} # <-- this is now empty
template:
metadata:
name: <redacted>-{{.number}}
spec:
destination:
name: <redacted>
project: <redacted>
source:
directory:
include: '{*.yml,*.yaml}'
jsonnet: {}
path: manifests/<redacted>/{{.number}}
repoURL: https://github.com/<redacted>
targetRevision: prod # using branch trackinggitops/concept-deals-search
syncPolicy:
automated:
allowEmpty: true
prune: true
status:
conditions:
- lastTransitionTime: "2025-09-03T09:12:13Z"
message: Successfully generated parameters for all Applications
reason: ApplicationSetUpToDate
status: "False"
type: ErrorOccurred
- lastTransitionTime: "2025-09-03T09:12:13Z"
message: Successfully generated parameters for all Applications
reason: ParametersGenerated
status: "True"
type: ParametersGenerated
- lastTransitionTime: "2025-09-03T09:12:13Z"
message: ApplicationSet up to date
reason: ApplicationSetUpToDate
status: "True"
type: ResourcesUpToDate
resources:
- group: argoproj.io
health:
lastTransitionTime: "2025-09-03T07:12:50Z"
status: Healthy
kind: Application
name: <redacted>-13034
namespace: argocd
status: Synced
version: v1alpha1
Regarding your questions / remarks:
Did you start experiencing the issue after an upgrade of ArgoCD version?
We are not quite sure. We started to notice after our upgrade to 3.0.11. But we also tried to switch to server-side-apply right around the same time and reverted it, since these exact fields were causing conflicts.
Are you managing your ApplicationSets with ArgoCD, helm, anything?
We are using Terraform to apply the raw manifests, using the alekc/kubectl provider.
The main reason for doing so are dependencies that we can easily get via Terraform, like secrets, cluster credentials, etc.
Have you considered using audit logs to understand what is changing the ApplicationSet manifests?
Yes, but I couldn't get them to work.
We watched the Kubernetes events and fieldManager section as an alternative.
Does the corruption happen when ApplicationSet controller is disabled?
That is an excellent point, I haven't thought of that yet. The problem here might be that the corruption only happens after "some time" or "some event", so we'd have to take it down for an unknown period of time and would block the entire company. We have not yet reproduced this issue in a lab / staging environment.
In the meantime I found one issue that sounds very similar, at least for the template part being in the wrong place:
https://github.com/argoproj/argo-cd/issues/18535
Maybe there's a correlation..
Does the corruption happen when ApplicationSet controller is disabled?
By now the application-set-controller has been disabled for 36h and there are no corrupted ApplicationSets so far. The deployment was scaled down Friday 22:00 CEST, so outside of office hours to not impact the daily business. But I think it's a strong indication that the corruption is caused by the ApplicationSet controller. We will keep it disabled for another 24h.
We narrowed the cause of the issue down to the Webhook API of the application-set-controller.
We left the application-set-controller disabled for an entire weekend (72h+) and nothing happened. Before scaling it up again, we disabled the ApplicationSet webhooks for the PR generator (we use this to cut down the start-up time for preview environments).
The application-set-controller was running for another 4h without any data corruption.
I then manually send a single pull_request to the the ApplicationSet webhook API (/api/webhook) and the data corruption happened a few seconds later.
I already went through the code a bit, but I couldn't find a specific place that might be responsible for this.
FYI: we just finished upgrading to the latest ArgoCD v3.1.5, and the issue still exists in that one.
We could confirm that it has to do with the way ApplicationSets are being updated by the Webhook handler: https://github.com/argoproj/argo-cd/blob/master/applicationset/webhook/webhook.go#L610-L620
SyncPolicy and generators.*.Template, are actually being sent to the kubeAPI in this "broken" form,
because it sends the entire ApplicationSet struct to the kube API, including all the default fields like an empty SyncPolicy and generators.*.Template struct.
As far as we can tell, the only thing that is being patched in is the argocd.argoproj.io/application-set-refresh annotation. This could also be done using a partial Metadata Patch like so:
# import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
c.Patch(context.Background(), &metav1.PartialObjectMetadata{
TypeMeta: metav1.TypeMeta{
Kind: "ApplicationSet",
APIVersion: "argoproj.io/v1alpha1",
},
ObjectMeta: metav1.ObjectMeta{
Name: appSet.Name,
Namespace: appSet.Namespace,
Annotations: map[string]string{
common.AnnotationApplicationSetRefresh: "true",
},
},
}, client.Merge)
})
This potentially makes the retryOnConflict and the Get obsolete as well.
Just to clarify: Semantically, there is no real issue here.
What we initially perceived as an issue is just the default values being rendered and applied to the cluster. They do show up as recurring diff on kubectl diff though and potentially also in ArgoCD when using e.g. App-of-AppSets to manage ApplicationSets.
@dudinea I created a small draft PR to discuss the proposed fix: https://github.com/argoproj/argo-cd/pull/24586
@dudinea, @crenshaw-dev Any chance you could have another look here and in the Draft PR? Anything missing to move this forward? Feedback is much appreciated!
@dudinea, @crenshaw-dev (apologies for the repeated direct ping) It's been almost two months and this is still very much an issue for us. What can I do to move this forward?
@dsiebel thank you for your PR and sorry for the delay, I somehow missed your first ping. I'll try to take a look at it tomorrow.
Hi @dsiebel! Please see my comment in the PR. One more time sorry for the delays
Hi @dudinea! Thanks for getting back to me! And no worries, I only managed to check in on this every few weeks myself. I replied in the PR.
We added this to our helm values as a workaround for now:
# ? https://github.com/argoproj/argo-cd/issues/24378 - ignoring for all generators and sub-generators
resource.customizations.ignoreDifferences.argoproj.io_ApplicationSet: |
jqPathExpressions:
- .spec.generators[]?.[]?.template
- .spec.generators[]?.[]?.generators[]?.[]?.template