argo-cd-helmfile icon indicating copy to clipboard operation
argo-cd-helmfile copied to clipboard

context deadline exceeded / unknown sync

Open kfirfer opened this issue 3 years ago • 14 comments

Hello

Sometimes in ArgoCD I can see this message in apps:

/usr/local/bin/helmfile --helm-binary /usr/local/bin/helm --no-color --allow-no-matching-release --namespace chaos-testing repos Adding repo chaos-mesh https://charts.chaos-mesh.org in ./helmfile.yaml: command "/usr/local/bin/helm" exited with non-zero status: PATH: /usr/local/bin/helm ARGS: 0: /usr/local/bin/helm (19 bytes) 1: repo (4 bytes) 2: add (3 bytes) 3: chaos-mesh (10 bytes) 4: https://charts.chaos-mesh.org (29 bytes) 5: --force-update (14 bytes) ERROR: exit status 1 EXIT STATUS 1 STDERR: Error: context deadline exceeded COMBINED OUTPUT: Error: context deadline exceeded

They are in unknown sync , and after few minutes (1-3) its synced and back to normal

Its doesn't matter which app, it could be any

What cause it to happen ?

kfirfer avatar Aug 27 '22 17:08 kfirfer

That's a good question :( I usually only see that when my git endpoint goes down, otherwise I don't really see that error.

How many helm repos are you loading up?

travisghansen avatar Aug 27 '22 21:08 travisghansen

I have around 50 helm repos and 60 apps

its happens usually when im changing the values or adding new charts when the system is static, its not occurs that much

kfirfer avatar Aug 27 '22 21:08 kfirfer

Hello,

Could be that helm/helmfile build/template takes a long time, so I think you can fix this with:

https://github.com/argoproj/argo-cd/blob/master/docs/user-guide/config-management-plugins.md#using-a-cmp

"!!! important If your sidecar CMP command runs too long, the command will be killed, and the UI will show an error. The CMP server respects the timeouts set by the server.repo.server.timeout.seconds and controller.repo.server.timeout.seconds items in argocd-cm. Increase their values from the default of 60s.

Each CMP command will also independently timeout on the ARGOCD_EXEC_TIMEOUT set for the CMP sidecar. The default is 90s. So if you increase the repo server timeout greater than 90s, be sure to set ARGOCD_EXEC_TIMEOUT on the sidecar."

tpatrascu-flowx avatar Aug 27 '22 21:08 tpatrascu-flowx

I have set this values in argocd chart:

controller:
  args:
    statusProcessors: "40"
    operationProcessors: "20"
    selfHealTimeout: "10"
    repoServerTimeoutSeconds: "600"

and env var ARGOCD_EXEC_TIMEOUT to 10m in reposerver

i will test it and inform if its helped

kfirfer avatar Aug 27 '22 22:08 kfirfer

How many argo apps do those 60 repos and 50 apps span?

travisghansen avatar Aug 27 '22 22:08 travisghansen

How many argo apps do those 60 repos and 50 apps span?

What do you mean by span ?

I have set this values in argocd chart:

controller:
  args:
    statusProcessors: "40"
    operationProcessors: "20"
    selfHealTimeout: "10"
    repoServerTimeoutSeconds: "600"

and env var ARGOCD_EXEC_TIMEOUT to 10m in reposerver

i will test it and inform if its helped

No it doesnt helped

kfirfer avatar Aug 27 '22 23:08 kfirfer

Mabye I will try to setup helm chart proxy (e.g. harbor) And point any repository to the proxy What do you think ?

kfirfer avatar Aug 27 '22 23:08 kfirfer

I mean how many instances of argocd app are in the cluster? Is it all 1 giant app in the same ns or are there 50 argocd apps?

travisghansen avatar Aug 28 '22 00:08 travisghansen

There are 50 argocd apps across around 25 namespaces

Each namespace have exactly 2 argocd apps , one for helmfile and one for manifests (e.g. namespace , resource quotas etc..) Each helmfile have around 1-5 helm charts (depends what the namespace does)

In this cluster we have around 60+ helm charts releases, 50 argocd apps and 25 namespaces

kfirfer avatar Aug 28 '22 10:08 kfirfer

Ok that helps! Do you put all the helm repos in a base that all helmfiles use? Or does each helmfile only pull down the repos it actually uses?

Said differently, does each argo app end up refreshing/syncing all 50 repos?

travisghansen avatar Aug 28 '22 13:08 travisghansen

no it is separated , theres no base for the repos each helmfile have his own repos

kfirfer avatar Aug 28 '22 19:08 kfirfer

Hmm, then the number shouldn’t impact you too much. I’m not sure what would be taking so long. Do they take a long time to render/template locally on your workstation?

You may try bumping the number of repo server pods to alleviate the pressure and scale out a little bit.

travisghansen avatar Aug 29 '22 13:08 travisghansen

I have set the replicas of repo server to 2 instead 1 Will check how its works now and update

kfirfer avatar Sep 05 '22 11:09 kfirfer

@travisghansen The unknown sync issue is still exists unforthently

Somehow, only when I change values in some app (no matter which one), other apps getting the unknown sync problem

Its seems like argocd is trying to sync the apps no matter if they unchanged in the git

They also cant download the charts for some period of time:

/tmp/helmfile3211379699/dex-dex-values-6487f67968 (49 bytes) 346: --kube-version=1.23 (19 bytes) 347: --api-versions=acme.cert-manager.io/v1 (38 bytes) ERROR: exit status 1 EXIT STATUS 1 STDERR: Error: failed to download "dex/dex" at version "0.9.0" COMBINED OUTPUT: Error: failed to download "dex/dex" at version

Mabye caching the helm charts in repo server could solve it ?

kfirfer avatar Sep 06 '22 22:09 kfirfer

Any luck figuring this out?

travisghansen avatar Oct 26 '22 04:10 travisghansen

I haven’t yet set up and tried proxy helm repositories

I have set the monitoring treshold to an hour(not ideal) but it ignores it for now until i’ll test it deeper

kfirfer avatar Nov 12 '22 14:11 kfirfer