argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

Failed to verify token: oidc: expected audience "argo-cd-cli" got ["argo-cd"]

Open paoll opened this issue 1 year ago • 26 comments

Checklist:

  • [x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • [x] I've included steps to reproduce the bug.
  • [x] I've pasted the output of argocd version.

Describe the bug

We did upgrade both our deployment of ArgoCD using in-house chart, and the image to latest 2.7.0 (then 2.7.1.) , and since argocd-server is spamming with logs:

failed to verify token: oidc: expected audience \"argo-cd-cli\" got [\"argo-cd\"]

We did rollbacked both chart and argocd version to previous, and warning is still present. Both logging through UI or CLI is working as well.

We are using Google authentication using Google groups through the dex, without any allowedAudiences configured, so we do expect argo-cd being part of allowed audiences, according to https://github.com/argoproj/argo-cd/blob/064c8da942c56bf45745d7bc1b9af8ded979aa61/util/settings/settings.go#L1754

To Reproduce

Dex configuration:

dex.config: |
  connectors:
    - type: google
      id: google
      name: Google
      config:
        redirectUri: https://<REDACTED>/api/dex/callback
        clientID: <REDACTED>.apps.googleusercontent.com
        clientSecret: $oidc.google.clientSecret
        serviceAccountFilePath: /tmp/oidc/googleAuth.json
        adminEmail: <REDACTED>

Expected behavior

Without any allowed audience explicitely configured, we're expecting that argo-cd is allowed as one.

Version

argocd: v2.7.1+5e54351.dirty
  BuildDate: 2023-05-02T19:24:14Z
  GitCommit: 5e543518dbdb5384fa61c938ce3e045b4c5be325
  GitTreeState: dirty
  GoVersion: go1.20.3
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.7.1+5e54351.dirty
  BuildDate: 2023-05-02T19:24:14Z
  GitCommit: 5e543518dbdb5384fa61c938ce3e045b4c5be325
  GitTreeState: dirty
  GoVersion: go1.20.3
  Compiler: gc
  Platform: darwin/arm64
  Kustomize Version: could not get kustomize version: exec: "kustomize": executable file not found in $PATH
  Helm Version: v3.11.3+g66a969e
  Kubectl Version: v0.24.2
  Jsonnet Version: v0.19.1

paoll avatar May 03 '23 09:05 paoll

Hey having this same isue. upgrading from 2.6.7 to 2.7.3 Please advise on a solution this is blocking.

Epiclemonaid avatar May 26 '23 03:05 Epiclemonaid

We upgraded to 2.7.3 and had the same issue. Rolling back to the previous version didn't fix it. Completely uninstalling ArgoCD and redeploying 2.7.3 it fixed the issue for us.

batkins-carecloud avatar May 31 '23 21:05 batkins-carecloud

Same issue in 2.7.1

JhoLee avatar Jun 22 '23 08:06 JhoLee

~Same issue on 2.7.8~ While I was seeing the issue, I accidentally has multiple pods of Dex deployed. Reducing that to 1 pod fixed the issue

Moulick avatar Jul 25 '23 09:07 Moulick

@paoll Could you please confirm if that issue actually breaks the authorization flow in your case? If it does - which error do you see in the browser?

P.s. I'm seeing the same warning on v2.7.10

legal90 avatar Aug 02 '23 09:08 legal90

@legal90, afaik it does not break the authorization flow, as the the authentication one is working (users can login and use dashboard mainly as read-only). We don't operate actions through the dashboard, and everyone can see everything, so if we had authorization issue, we would'nt notice.

The only dysfunctional behavior we're experiencing is the error message from argocd-server.

paoll avatar Aug 13 '23 08:08 paoll

We are also experiencing this issue on v2.8.2+dbdfc71

sachk avatar Sep 10 '23 14:09 sachk

~Same issue on 2.7.8~ While I was seeing the issue, I accidentally has multiple pods of Dex deployed. Reducing that to 1 pod fixed the issue

Reducing the Dex replications 3 -> 1 seemingly solved this for me

donovanrost avatar Nov 03 '23 20:11 donovanrost

The error message is likely misleading. The token verification code loops over allowed audiences, runs token verification for each, and logs the last error message encountered.

In this case I believe the first error message (the relevant one) was discarded, and the second error message was irrelevant (because it was attempting to verify the wrong audience).

This change will cause all token verification errors to be logged:

https://github.com/argoproj/argo-cd/pull/16625

crenshaw-dev avatar Dec 15 '23 21:12 crenshaw-dev

Having same issue with gitlab <-> dex

  repository.credentials: ''
  dex.config: |
    connectors:
    - type: gitlab
      id: gitlab
      name: Gitlab
      config:
        baseURL: https://SELFHOST URL
        clientID: xxxxxxxxxxxx
        clientSecret: $oidc.dex.clientSecret
        groups: 
          - admin
        issuer: https://URL/api/dex
        redirectURI: https://URL/api/dex/callback
        useLoginAsID: false
        users.anonymous.enabled: "false"``` 

parthyp avatar Dec 16 '23 00:12 parthyp

To be clear, I think it's possible that every person on this thread is facing a completely different issue, hidden by an irrelevant error message.

crenshaw-dev avatar Dec 16 '23 00:12 crenshaw-dev

After upgrading an ArgoCD 2.3.10 instance to 2.6.15, I'm seeing the following in the argocd-server logs:

time="2024-02-07T14:59:17Z" level=warning msg="Failed to verify token: failed to verify token: oidc: expected audience \"argo-cd-cli\" got [\"argo-cd\"]"
time="2024-02-07T14:59:17Z" level=info msg="finished streaming call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = invalid session: failed to verify the token" grpc.code=Unauthenticated grpc.method=Watch grpc.service=application.ApplicationService grpc.start_time="2024-02-07T14:59:17Z" grpc.time_ms=2.521 span.kind=server system=grpc

These are happening once per second. That doesn't match the liveness (30s) or readiness (10s) probe periods, so I'm trying to find out why these are happening.

To be clear, nothing seems broken. Dex is working just fine and SSO (Google) works. I just want to figure out why this is happening in order to improve the logs.

EDIT: Could this be users with old ArgoCD JS cached and invalidated sessions that are trying to access the upgraded ArgoCD?

ghost avatar Feb 07 '24 15:02 ghost

Could this be users with old ArgoCD JS cached and invalidated sessions that are trying to access the upgraded ArgoCD?

100%

I think we should follow this suggestion: https://github.com/coreos/go-oidc/pull/406#pullrequestreview-1785515608

We should skip the client ID check in the go-oidc library and instead check it ourselves. That'll allow us to ignore irrelevant "failed to verify token" errors and instead log the actual error.

crenshaw-dev avatar Feb 07 '24 15:02 crenshaw-dev

Having same issue with gitlab <-> dex

Me as well. SSO login on UI working fine.

$argocd app list
FATA[0012] rpc error: code = Unauthenticated desc = invalid session: failed to verify the token

with logs:

time="2024-02-08T11:01:37Z" level=warning msg="Failed to verify token: failed to verify token: token verification failed for all audiences: error for aud \"argo-cd\": oidc: expected audience \"argo-cd\" got [\"argo-cd-cli\"]"
time="2024-02-08T11:01:37Z" level=info msg="received unary call /version.VersionService/Version" grpc.method=Version grpc.request.content= grpc.service=version.VersionService grpc.start_time="2024-02-08T11:01:37Z" span.kind=server system=grpc
time="2024-02-08T11:01:37Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Version grpc.service=version.VersionService grpc.start_time="2024-02-08T11:01:37Z" grpc.time_ms=2.739 span.kind=server system=grpc
time="2024-02-08T11:01:38Z" level=warning msg="Failed to verify token: failed to verify token: token verification failed for all audiences: error for aud \"argo-cd\": oidc: expected audience \"argo-cd\" got [\"argo-cd-cli\"]"
time="2024-02-08T11:01:38Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = invalid session: failed to verify the token" grpc.code=Unauthenticated grpc.method=List grpc.service=application.ApplicationService grpc.start_time="2024-02-08T11:01:38Z" grpc.time_ms=1.292 span.kind=server system=grpc

I only have a single dex replica, never had more than one.

Note that ripping the token out of my cookie from the UI and using it with the CLI works, so that's a possible workaround for those in need.

ForbiddenEra avatar Feb 08 '24 11:02 ForbiddenEra

This exact issue happened to us seemingly out of nowhere. These log entries correlated perfectly with a sudden wave of requests in our network logs with a HTTP 401 result. They all ended up being requests to:

https://<MY_ARGO_HOST>/api/v1/stream/applications?resourceVersion=137215386&fields=result.type%2Cresult.application.metadata.name%2Cresult.application.metadata.namespace%2Cresult.application.metadata.annotations%2Cresult.application.metadata.labels%2Cresult.application.metadata.creationTimestamp%2Cresult.application.metadata.deletionTimestamp%2Cresult.application.spec%2Cresult.application.operation.sync%2Cresult.application.status.sync.status%2Cresult.application.status.sync.revision%2Cresult.application.status.health%2Cresult.application.status.operationState.phase%2Cresult.application.status.operationState.finishedAt%2Cresult.application.status.operationState.operation.sync%2Cresult.application.status.summary%2Cresult.application.status.resources&selector=&appNamespace=

with the Referer field set to: https://<MY_ARGO_HOST>/applications?showFavorites=false&proj=&sync=Unknown%2CSynced&autoSync=&health=&namespace=&cluster=&labels=. In our case, the source IP of the requests ended up being one of our employees who left the ArgoCD UI open in their browser.

So this definitely seems like a bug in the UI, where it endlessly tries to refresh the Applications listing even though all its getting in response is 401s. Let me know if you want me to provide any more details on this 🙏

aleks-andr avatar Feb 13 '24 12:02 aleks-andr

Could this be users with old ArgoCD JS cached and invalidated sessions that are trying to access the upgraded ArgoCD?

100%

I think we should follow this suggestion: coreos/go-oidc#406 (review)

We should skip the client ID check in the go-oidc library and instead check it ourselves. That'll allow us to ignore irrelevant "failed to verify token" errors and instead log the actual error.

Just commenting that I don't think that's at all the issue I'm having; whatever might be happening in a browser somewhere shouldn't affect me trying to login with the CLI.

ForbiddenEra avatar Feb 14 '24 16:02 ForbiddenEra

We're seeing this is one of our environments (the least critical thankfully): Running v2.10.2+fcf5d8c from this YAML. Our argocd-cm ConfigMap contains the following for dex.config:

logger:
  level: debug
  format: json
connectors:
- type: saml
  id: saml
  name: saml
  config:
    entityIssuer: $sso:externalUrl
    ssoURL: $sso:ssoUrl
    caData: $sso:certificate
    redirectURI: $sso:externalUrl
    usernameAttr: email
    emailAttr: email
    groupsAttr: Group

We're observing ArgoCD being slow as molasses and the error message Failed to verify token: failed to verify token: oidc: expected audience \"argo-cd-cli\" got [\"argo-cd\"]. Authentication and normal functionality works as intended albeit it being very slow.

evenh avatar Mar 13 '24 13:03 evenh

It's possible that every person on this thread is experiencing an entirely different root cause. We won't know until we make changes to avoid obscuring the actual token validation error.

I think we should follow this suggestion: https://github.com/coreos/go-oidc/pull/406#pullrequestreview-1785515608

We should skip the client ID check in the go-oidc library and instead check it ourselves. That'll allow us to ignore irrelevant "failed to verify token" errors and instead log the actual error.

This is where the change should be make. https://github.com/argoproj/argo-cd/blob/565aa8e1f5b4ea2ad8380a3347de774f6e0a9920/util/oidc/provider.go#L122

crenshaw-dev avatar Mar 13 '24 14:03 crenshaw-dev

Actually, I forgot I wrote code to log all audiences' error messages. That'll be in 2.11.

crenshaw-dev avatar Mar 13 '24 14:03 crenshaw-dev

It's possible that every person on this thread is experiencing an entirely different root cause. We won't know until we make changes to avoid obscuring the actual token validation error.

Entirely possible; hopefully people come back with more details.

Actually, I forgot I wrote code to log all audiences' error messages. That'll be in 2.11.

I'll check it out when I get to 2.11 then and report back; fortunately only I use the CLI and even then very occasionally, even the UI is only for monitoring as everything else is declarative in my case.

Though, there's still a few CLI things I need to do occasionally and it'd be nice to not have to kick things and copy/paste tokens around. :-)

ForbiddenEra avatar Mar 20 '24 08:03 ForbiddenEra

@crenshaw-dev I have upgraded 2.10 to 2.11 RC1 and we're seeing a single oidc connector fail with a similar but reversed error message as everyone else in this issue.

time="2024-04-14T17:34:05Z" level=warning 
msg="Failed to verify token: failed to verify token: token verification failed for all audiences: error for aud \"argo-cd\": oidc: expected audience \"argo-cd\" got [\"argo-cd-cli\"]"

Unfortunately even in the 2.11 RC image this seems to be the only information logged. The connector that's failing is related to the single place where we use the argo cli, in a CI pipeline. Users are having no issue with our integrated SSO in the UI. We fetch the appropriate CLI version from the server during the pipeline and generate a token in each run from a secret.

  # Using Buildkite JWT to generate a token for Dex
  BUILDKITE_JWT=$(buildkite-agent oidc request-token \
    --lifetime 300 \
    --audience argo-cd-cli )
  export ARGOCD_AUTH_TOKEN=$(curl  http://${ARGOCD_SERVER}/api/dex/token \
  --user argo-cd-cli:bk-oidc-secret \
  --data-urlencode connector_id=bk-oidc \
  --data-urlencode grant_type=urn:ietf:params:oauth:grant-type:token-exchange \
  --data-urlencode scope="openid groups federated:id" \
  --data-urlencode requested_token_type=urn:ietf:params:oauth:token-type:access_token \
  --data-urlencode subject_token="$BUILDKITE_JWT" \
  --data-urlencode subject_token_type=urn:ietf:params:oauth:token-type:id_token | jq -r .access_token)

We expect the value of 'argo-cd-cli' so the error message is particularly odd.

      staticClients:
        - name: BK agents OIDC
          id: argo-cd-cli
          secret: bk-oidc-secret
          public: true

You mentioned above that something might be caching a token. I have restarted all our BK agent pods (they run in-cluster) and all of the argocd pods were new (including the dex server) from the 2.11 upgrade. However in the interests of science, I also restarted every argocd pod. Same result regardless; the CI pipeline is blocked and cannot authenticate.

zswanson avatar Apr 14 '24 23:04 zswanson

@zswanson your error is because my change to improve the error logs was bad. Put up a fix: https://github.com/argoproj/argo-cd/pull/17876

crenshaw-dev avatar Apr 17 '24 21:04 crenshaw-dev

I tried 2.11 RC3 and i still see errors:

time="2024-05-06T10:25:03Z" level=info msg="Initializing OIDC provider (issuer: https://argocd.my-domain.com/api/dex)"
time="2024-05-06T10:25:03Z" level=info msg="OIDC supported scopes: [openid email groups profile offline_access]"
time="2024-05-06T10:25:03Z" level=warning msg="Failed to verify token: failed to verify token: token verification failed for all audiences: error for aud \"argo-cd\": failed to verify signature: failed to verify id token signature, error for aud \"argo-cd-cli\": oidc: expected audience \"argo-cd-cli\" got [\"argo-cd\"]"
time="2024-05-06T10:25:03Z" level=info msg="finished streaming call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = invalid session: failed to verify the token" grpc.code=Unauthenticated grpc.method=Watch grpc.service=application.ApplicationService grpc.start_time="2024-05-06T10:25:03Z" grpc.time_ms=7.658 span.kind=server system=grpc

Dex pod's dex.yaml looks like the following:

connectors:
- config:
    clientID: xyz.apps.googleusercontent.com
    clientSecret: STR-xyz
    issuer: accounts.google.com
    redirectURI: https://argocd.my-domain.com/api/dex/callback
  id: google
  name: Google
  type: google
grpc:
  addr: 0.0.0.0:5557
issuer: https://argocd.my-domain.com/api/dex
oauth2:
  skipApprovalScreen: true
staticClients:
- id: argo-cd
  name: Argo CD
  redirectURIs:
  - https://argocd.my-domain.com/auth/callback
  secret: xyz
- id: argo-cd-cli
  name: Argo CD CLI
  public: true
  redirectURIs:
  - http://localhost
  - http://localhost:8085/auth/callback
- id: argo-cd-pkce
  name: Argo CD PKCE
  public: true
  redirectURIs:
  - http://localhost:4000/pkce/verify
storage:
  type: memory
telemetry:
  http: 0.0.0.0:5558
web:
  https: 0.0.0.0:5556
  tlsCert: /tmp/tls.crt
  tlsKey: /tmp/tls.key

hc2p avatar May 06 '24 10:05 hc2p

@hc2p in your case the actual error is this: "failed to verify signature: failed to verify id token signature." I'd start searching for answers specific to that error message, I see a few examples with a quick Google.

You can safely ignore this error, it's just a result of the audience check loop and is expected to fail:

error for aud "argo-cd-cli": oidc: expected audience "argo-cd-cli" got ["argo-cd"]"

crenshaw-dev avatar May 06 '24 14:05 crenshaw-dev

Hi @crenshaw-dev,

I hope you're doing well!

I was wondering if the fix in this pull request might be backported to the v2.10.x release? I’m currently using dex.config.

If I’m reading the code correctly, it looks like it adds two allowed audiences by default in this section of the code. https://github.com/argoproj/argo-cd/blob/ebb92d8c3bcda64348d98c14646cbdf5adcde0c6/util/settings/settings.go#L1923-L1925

Without the fix from the pull request, it seems like the warning log message will keep popping up.

Similar to this issue, I'm seeing the warning message logged thousands of times per minute.

Thanks a lot for your help!

d1egoaz avatar May 14 '24 18:05 d1egoaz

In one of our test clusters, I tried this ArgoCDClientAppID == ArgoCDCLIClientAppID, and it seems to have stopped the warning logs. The auth* flow is working as intended.

    dex.config: |
      connectors:
      - type: saml
        id: okta
        name: Okta
        config:
          clientID: argo-cd
          cliClientID: argo-cd
          ssoURL: https://<redacted>
          redirectURI: <redacted>
          ...

d1egoaz avatar May 14 '24 20:05 d1egoaz

You can safely ignore this error, it's just a result of the audience check loop and is expected to fail:

error for aud "argo-cd-cli": oidc: expected audience "argo-cd-cli" got ["argo-cd"]"

@crenshaw-dev If it was once every few minutes sure, but it's multiple times per minute, per pod, and has an impact on logging costs, plus the extra noise in the logs makes troubleshooting more cumbersome. Is there no way to resolve this properly? An "expected" warning shouldn't exist if it's sole purpose is to cry wolf.

Invincibear avatar Aug 21 '24 00:08 Invincibear

@Invincibear the spurious warning is fixed in recent versions.

crenshaw-dev avatar Aug 21 '24 00:08 crenshaw-dev

@Invincibear the spurious warning is fixed in recent versions.

Glad to hear, thanks. I realize now that OLM-deployed Argo CD isn't using the newest version. I'll have to investigate this myself.

argocd: v2.10.12+cb6f5ac
  BuildDate: 2024-06-06T09:27:57Z
  GitCommit: cb6f5ac8b8c08ce26cce08b8a1bd2144034bd79b
  GitTreeState: clean
  GoVersion: go1.21.10
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.11.5+c4b283c
  BuildDate: 2024-07-15T17:39:54Z
  GitCommit: c4b283ce0c092aeda00c78ae7b3b2d3b28e7feec
  GitTreeState: clean
  GoVersion: go1.21.10
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.2.1 2023-10-19T20:13:51Z
  Helm Version: v3.14.4+g81c902a
  Kubectl Version: v0.26.11
  Jsonnet Version: v0.20.0

Invincibear avatar Aug 21 '24 01:08 Invincibear

I'm gonna close this, because the causes are various, and newer versions of Argo CD will produce much more precise/useful error messages.

crenshaw-dev avatar Aug 23 '24 05:08 crenshaw-dev