gitops-operator icon indicating copy to clipboard operation
gitops-operator copied to clipboard

Unable to load data: error getting cached app managed resources: NOAUTH Authentication required.

Open garyd2 opened this issue 1 year ago • 12 comments

The GitOps operator in one env updated yesterday to 1.13.0 since then I cannot get to the resouces of any apps without hitting

Unable to load data: error getting cached app managed resources: NOAUTH Authentication required.

Restarted all pods in the Openshift-gitops namespace

Anyone seen this before? Is it something with Redis?

garyd2 avatar Jul 19 '24 14:07 garyd2

from which version the upgrade happened? Are you not able to view the resources at all or you can but NOAUTH error message keeps popping up? Also, can you share application controller logs?

svghadi avatar Jul 22 '24 07:07 svghadi

It was previously on 1.12.4 I believe. I am able to see the application tiles fine and it shows synced and healthy, but when I click into it and try and look at the pods it thows the error and I can't see any further pod details.

Logs of the application controller look like this (masked some details)

time="2024-07-22T07:49:00Z" level=info msg="Loading TLS configuration from secret xxxxx/argocd-server-tls"
time="2024-07-22T07:49:00Z" level=warning msg="Failed to save cluster info: NOAUTH Authentication required."
time="2024-07-22T07:49:04Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=xxx
time="2024-07-22T07:49:04Z" level=warning msg="Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation" application=xxxxxxx dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" fields.level=0
time="2024-07-22T07:49:04Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: development)" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="GetRepoObjs stats" application=xxxxxxx build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=54 unmarshal_ms=54 version_ms=0
time="2024-07-22T07:49:04Z" level=error msg="DiffFromCache error: error getting managed resources for app xxxx: NOAUTH Authentication required."
time="2024-07-22T07:49:04Z" level=error msg="Failed to cache app resources: error setting app resource tree: NOAUTH Authentication required." application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=13 fields.level=0 git_ms=54 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2024-07-22T07:49:04Z" level=info msg="Skipping auto-sync: application status is Synced" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="Update successful" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="Reconciliation completed" application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=13 fields.level=0 git_ms=54 health_ms=0 live_ms=0 patch_ms=33 setop_ms=0 settings_ms=0 sync_ms=0 time_ms=132
time="2024-07-22T07:49:12Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=xxxxxxx
time="2024-07-22T07:49:12Z" level=warning msg="Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation" application=xxxxxxx dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" fields.level=0
time="2024-07-22T07:49:12Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: development)" application=xxxxxxx
time="2024-07-22T07:49:12Z" level=info msg="GetRepoObjs stats" application=xxxxxxx build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=58 unmarshal_ms=58 version_ms=0
time="2024-07-22T07:49:12Z" level=error msg="DiffFromCache error: error getting managed resources for app xxx: NOAUTH Authentication required."
time="2024-07-22T07:49:12Z" level=error msg="Failed to cache app resources: error setting app resource tree: NOAUTH Authentication required." application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=15 fields.level=0 git_ms=59 health_ms=1 live_ms=0 settings_ms=0 sync_ms=0
time="2024-07-22T07:49:12Z" level=info msg="Skipping auto-sync: application status is Synced" application=xxxxxxx
time="2024-07-22T07:49:13Z" level=info msg="Update successful" application=xxxxxxx

garyd2 avatar Jul 22 '24 07:07 garyd2

Strange. 1.12.4 has the redis authentication change so it shouldn't cause upgrade issues. Is it happening only on one cluster or seeing similar behavior on others aswell? Can you also check if redis-initial-password secret is present and not empty in the ArgoCD instance namespace and REDIS_PASSWORD env is referencing this secret in repo-server, application-controller and argocd-server deployments correctly.

svghadi avatar Jul 22 '24 13:07 svghadi

I see an argocd-redis-initial-passord secret but the 2 data values are admin.password and immutable and both values are set, I don't have a REDIS_PASSWORD data value in it

garyd2 avatar Jul 22 '24 13:07 garyd2

Oh sorry, I should have framed it better😅. Check the REDIS_PASSWORD env var in repo-server, application-controller and argocd-server deployments, not in the redis secret. The secret contains only 2 values.

svghadi avatar Jul 22 '24 13:07 svghadi

Thanks

  • repo-server deployment looks good has a REDIS_PASSWORD and it point to the secret
  • application-controller statefulset - has NO REDIS_PASSWORD environment variable
  • argocd-server deployment has NO REDIS_PASSWORD environment variable either.

Would it be OK to just edit the statefuleset and deployment to add this in if they are missing?

garyd2 avatar Jul 22 '24 13:07 garyd2

Yes, we can try that. But the operator should have handle this automatically. Could be bug...

svghadi avatar Jul 22 '24 14:07 svghadi

Sorry for the delay in getting back, I updated the deployments and statefulset with the REDIS_PASSWORD environment variables and all is back working again, I can get the resources of apps again. Thanks a lot for your help

Feel free to close this, or if you want to keep it open to investigate a bug work away.

garyd2 avatar Jul 23 '24 12:07 garyd2

Great. I will keep this issue open until the bug is triaged. Thanks for reporting it.

svghadi avatar Jul 23 '24 12:07 svghadi

@garyd2 - Just to rule out the possibility of broken operator reconciliation, can you confirm if there are any unusual error messages in the operator manager pod logs?

svghadi avatar Jul 23 '24 14:07 svghadi

I have reviewed the openshift-gitops-operator-controller-manager logs and no errors are thown.

garyd2 avatar Jul 24 '24 07:07 garyd2

This seems to be happening between 1.13 and 1.14. Did an upgrade path all the way from 1.8 to 1.14. And with 1.14 it failed.

Also, for others that comes here to read about the same issue. Whenever you start a fresh redis-ha SS, it will take some time for it to initialize a new Redis cluster. Bitten by this a couple of times now.

MindTooth avatar May 06 '25 14:05 MindTooth