argo-cd-helmfile
argo-cd-helmfile copied to clipboard
context deadline exceeded / unknown sync
Hello
Sometimes in ArgoCD I can see this message in apps:
/usr/local/bin/helmfile --helm-binary /usr/local/bin/helm --no-color --allow-no-matching-release --namespace chaos-testing repos Adding repo chaos-mesh https://charts.chaos-mesh.org in ./helmfile.yaml: command "/usr/local/bin/helm" exited with non-zero status: PATH: /usr/local/bin/helm ARGS: 0: /usr/local/bin/helm (19 bytes) 1: repo (4 bytes) 2: add (3 bytes) 3: chaos-mesh (10 bytes) 4: https://charts.chaos-mesh.org (29 bytes) 5: --force-update (14 bytes) ERROR: exit status 1 EXIT STATUS 1 STDERR: Error: context deadline exceeded COMBINED OUTPUT: Error: context deadline exceeded
They are in unknown sync , and after few minutes (1-3) its synced and back to normal
Its doesn't matter which app, it could be any
What cause it to happen ?
That's a good question :( I usually only see that when my git endpoint goes down, otherwise I don't really see that error.
How many helm repos are you loading up?
I have around 50 helm repos and 60 apps
its happens usually when im changing the values or adding new charts when the system is static, its not occurs that much
Hello,
Could be that helm/helmfile build/template takes a long time, so I think you can fix this with:
https://github.com/argoproj/argo-cd/blob/master/docs/user-guide/config-management-plugins.md#using-a-cmp
"!!! important If your sidecar CMP command runs too long, the command will be killed, and the UI will show an error. The CMP server respects the timeouts set by the server.repo.server.timeout.seconds and controller.repo.server.timeout.seconds items in argocd-cm. Increase their values from the default of 60s.
Each CMP command will also independently timeout on the ARGOCD_EXEC_TIMEOUT set for the CMP sidecar. The default
is 90s. So if you increase the repo server timeout greater than 90s, be sure to set ARGOCD_EXEC_TIMEOUT on the
sidecar."
I have set this values in argocd chart:
controller:
args:
statusProcessors: "40"
operationProcessors: "20"
selfHealTimeout: "10"
repoServerTimeoutSeconds: "600"
and env var ARGOCD_EXEC_TIMEOUT to 10m in reposerver
i will test it and inform if its helped
How many argo apps do those 60 repos and 50 apps span?
How many argo apps do those 60 repos and 50 apps span?
What do you mean by span ?
I have set this values in argocd chart:
controller: args: statusProcessors: "40" operationProcessors: "20" selfHealTimeout: "10" repoServerTimeoutSeconds: "600"and env var
ARGOCD_EXEC_TIMEOUTto 10m in reposerveri will test it and inform if its helped
No it doesnt helped
Mabye I will try to setup helm chart proxy (e.g. harbor) And point any repository to the proxy What do you think ?
I mean how many instances of argocd app are in the cluster? Is it all 1 giant app in the same ns or are there 50 argocd apps?
There are 50 argocd apps across around 25 namespaces
Each namespace have exactly 2 argocd apps , one for helmfile and one for manifests (e.g. namespace , resource quotas etc..) Each helmfile have around 1-5 helm charts (depends what the namespace does)
In this cluster we have around 60+ helm charts releases, 50 argocd apps and 25 namespaces
Ok that helps! Do you put all the helm repos in a base that all helmfiles use? Or does each helmfile only pull down the repos it actually uses?
Said differently, does each argo app end up refreshing/syncing all 50 repos?
no it is separated , theres no base for the repos each helmfile have his own repos
Hmm, then the number shouldn’t impact you too much. I’m not sure what would be taking so long. Do they take a long time to render/template locally on your workstation?
You may try bumping the number of repo server pods to alleviate the pressure and scale out a little bit.
I have set the replicas of repo server to 2 instead 1 Will check how its works now and update
@travisghansen The unknown sync issue is still exists unforthently
Somehow, only when I change values in some app (no matter which one), other apps getting the unknown sync problem
Its seems like argocd is trying to sync the apps no matter if they unchanged in the git
They also cant download the charts for some period of time:
/tmp/helmfile3211379699/dex-dex-values-6487f67968 (49 bytes) 346: --kube-version=1.23 (19 bytes) 347: --api-versions=acme.cert-manager.io/v1 (38 bytes) ERROR: exit status 1 EXIT STATUS 1 STDERR: Error: failed to download "dex/dex" at version "0.9.0" COMBINED OUTPUT: Error: failed to download "dex/dex" at version
Mabye caching the helm charts in repo server could solve it ?
Any luck figuring this out?
I haven’t yet set up and tried proxy helm repositories
I have set the monitoring treshold to an hour(not ideal) but it ignores it for now until i’ll test it deeper