argocd-notifications icon indicating copy to clipboard operation
argocd-notifications copied to clipboard

Failed to notify recipient

Open hroyg opened this issue 3 years ago • 13 comments

Summary

ever since we upgraded argocd version to v2.1.3 and with the new version github authentication updated to be a secret instead of in the cm with reference to a secret (the github repo url is defined now in the new version in a secret and not in cm as before) and the authentication configurations definition has changed,

we get error from argocd notification and that in turn makes apicalls and slack messages fail and not being executed/sent.

this does not happen all the time , it seems to randomly happen with some applications. what i dont understand and probably because i dont understand how argocd-notification works exactly, is why does that started happen after argocd version upgrade, doesnt this function that it executes and fails (<call .repo.GetCommitMetadata .app.status.operationState.syncResult.revision>) is executed by the argocd-notifications ??..

Diagnostics

eks

argocd: 2.1.3 argocd notifications: v1.1.1


time="2021-10-14T11:26:40Z" level=error msg="Failed to notify recipient {jenkins } defined in app argocd/monitoring: template: jenkins-api-calljenkins:1:27: executing \"jenkins-api-calljenkins\" at <call .repo.GetCommitMetadata .app.status.operationState.syncResult.revision>: error calling call: rpc error: code = Internal desc = Failed to fetch 8179a397e623f56c7f36b4a5781ad233af2bbe5b: `git fetch origin --tags --force` failed exit status 128: fatal: could not read Username for 'https://github.com': No such device or address" app=argocd/monitoring


time="2021-10-14T11:26:40Z" level=error msg="Failed to notify recipient {slack cloud-cd-stage} defined in app argocd/monitoring: template: custom-synced-and-healty:5:26: executing \"custom-synced-and-healty\" at <call .repo.GetCommitMetadata .app.status.operationState.syncResult.revision>: error calling call: rpc error: code = Internal desc = Failed to fetch 8179a397e623f56c7f36b4a5781ad233af2bbe5b: `git fetch origin --tags --force` failed exit status 128: fatal: could not read Username for 'https://github.com': No such device or address" app=argocd/monitoring

any input here to resolve the issue will be much appropriated .

Thanks

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

hroyg avatar Oct 14 '21 11:10 hroyg

@hroyg I think we need to break down the issue.

  1. if you downgrade the version of ArgoCD to old, is the problem fixed?
  2. the problem is ArgoCD version problem or argocd-notifications problem or something

ryota-sakamoto avatar Oct 25 '21 15:10 ryota-sakamoto

@ryota-sakamoto

  1. Downgrading argocd version back to v2.0.3 (and when downgrading i also changed the way i pass the github authentication back to be in configmap resolved the issue . we changed back the authentication mechanism to be in CM as follow:

repositories: | - passwordSecret: key: password name: repo-248020157 type: git url: https://github.com/Firm/k8s-cloud-resources.git usernameSecret: key: username name: repo-248020157

** also resolved the issue staying in argocd v2.1.3 (not downgrading back to previous version) and just changing the config to be as above (the old way, as we used it before the upgrade to v2.1.3), so the problem i guess relates to the new way argocd pass the github PAT password, or maybe how argocd notifications uses them (not to familiar with the flow argocd notifications connects with github through argocd server/repo-server)

  1. The problem started after upgrading argocd version, but the error appears in argocd notifications controller.

** we haven't encounter any functionality issues for argocd with the new way of authenticating to github (the new way is to pass the repo name and authentication, e.g PAT, as secrets).

hroyg avatar Oct 25 '21 20:10 hroyg

Same issue, not working with ArgoCD v2.1.1, downgrading to v2.0.5 helped.

(call .repo.GetAppDetails).Helm.GetParameterValueByName is randomly failing with Failed to fetch 023d82c5c49bdc9aa05ac32801d2800e900ff7c0: 'git fetch origin --tags --force'

As well when two notification services are defined and same function is used in both service templates ((call .repo.GetAppDetails).Helm.GetParameterValueByName), it fails only with the first one. The second is sent normally without any errors.

slack notification service template:

...
          {
            "title": "Upstream Repository",
            "value": "{{ (call .repo.GetAppDetails).Helm.GetParameterValueByName "app-vue.labels.upstreamRepository" }}",
            "short": true
          },
...

Webhook notification service template:

...
path: /api/v4/projects/{{(call .repo.GetAppDetails).Helm.GetParameterValueByName "app-vue.labels.upstreamRepository"}}/statuses/{{(call .repo.GetAppDetails).Helm.GetParameterValueByName "app-vue.labels.commitSHA"}}?state=success
...

Thumbiceq avatar Nov 02 '21 13:11 Thumbiceq

I reproduced this issue, then I'm investigating it.

ryota-sakamoto avatar Nov 08 '21 16:11 ryota-sakamoto

think this the same issue: https://github.com/argoproj-labs/argocd-notifications/issues/356 I'm also affected by this but... not always, for whatever reason, on some occasions I get the Failed to fetch on other the notifications work and get data from the commit

mbolek avatar Nov 19 '21 11:11 mbolek

The reason this broke without updating argocd-notifications is because it calls the argocd-repo-server service to get the information and it is actually that services peerforming the call and failing. https://github.com/argoproj-labs/argocd-notifications/blob/c461d624b4c02452e85821361bb1c4c2d2e487b7/shared/argocd/service.go#L74

There is also a cache mechanism in argo-repo-server so that might explain why sometimes the notification goes through.

agaudreault avatar Nov 30 '21 20:11 agaudreault

For reference, we have the same problem and our ArgoCD instance is configured on our private repositories with credential template and a GitHub app according to https://argo-cd.readthedocs.io/en/stable/user-guide/private-repositories/#github-app-credential. (Using 2.1.3)

argocd repocreds list
URL PATTERN              USERNAME  SSH_CREDS  TLS_CREDS
https://github.com/org   -         false      false

agaudreault avatar Nov 30 '21 20:11 agaudreault

I was hoping the new release v1.2.1/ #370 that fixed my similar issue #356 would have also fixed this (which I am now seeing clearly after upgrading).

My specific errors are around de-referencing the Application object attributes in the trigger (when and oncePer clause). I have no way to gracefully handle these as I do in the templates (setting default values in case they don't exist).

time="2021-12-15T22:01:28Z" level=error msg="failed to execute oncePer condition: cannot fetch images from <nil> (1:20)\n | app.status.summary.images\n | ...................^"
time="2021-12-15T22:01:28Z" level=error msg="failed to execute when condition: cannot fetch phase from <nil> (1:27)\n | app.status.operationState.phase in ['Error', 'Failed']\n | ..........................^"
time="2021-12-15T22:01:28Z" level=error msg="failed to execute oncePer condition: cannot fetch syncResult from <nil> (1:27)\n | app.status.operationState.syncResult.revision\n | ..........................^"

Any help appreciated.

tsunamishaun avatar Dec 15 '21 22:12 tsunamishaun

After upgrading to v1.2.1 with argoCD v2.2.3, I am able to use .repo.GetCommitMetadata.

agaudreault avatar Jan 27 '22 17:01 agaudreault

Hi, I am using in the slack templates:

...
{
              "title": "Author",
              "value": "{{(call .repo.GetCommitMetadata .app.status.sync.revision).Author}}",
              "short": true
            }
...

It works but it gives me the next error:

argocd-notifications-controller-769fb8f4fd-ptgql argocd-notifications-controller time="2022-03-08T09:30:24Z" level=error msg="Failed to notify recipient {slack releases-dev} defined in resource argocd/service1: template: app-deployed:23:17: executing \"app-deployed\" at <call .repo.GetCommitMetadata .app.status.sync.revision>: error calling call: rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: tls: first record does not look like a TLS handshake\"" resource=argocd/service1

I think that is related with argocd-repo. But I don't want to disable here TLS because I have to disable it also in server and application-controller. Is there any other way to show the author of the commit?

I am using these versions:

argocd: 2.2.5 argo-notifications: 1.2.1

Thanks

ichasco-heytrade avatar Mar 08 '22 09:03 ichasco-heytrade

Is there any update on this? It seems the issue is still open and nowhere to the solution?

muhammad-asn avatar Jun 16 '22 06:06 muhammad-asn

if this issue hasn’t been resolved in latest argo-cd v2.4.0, then i think this issue should be resubmitted upstream because notification code was merged with argo-cd repo. I doubt issues here are monitored or being worked on anymore.

mubarak-j avatar Jun 16 '22 18:06 mubarak-j

I'm unsure how to follow @mubarak-j 's advice above, so I'll throw on here that this is still occurring in v2.4.11+3d9e9f2.

sinkr avatar Aug 26 '22 12:08 sinkr