gitops-operator
gitops-operator copied to clipboard
Unable to load data: error getting cached app managed resources: NOAUTH Authentication required.
The GitOps operator in one env updated yesterday to 1.13.0 since then I cannot get to the resouces of any apps without hitting
Unable to load data: error getting cached app managed resources: NOAUTH Authentication required.
Restarted all pods in the Openshift-gitops namespace
Anyone seen this before? Is it something with Redis?
from which version the upgrade happened? Are you not able to view the resources at all or you can but NOAUTH error message keeps popping up? Also, can you share application controller logs?
It was previously on 1.12.4 I believe.
I am able to see the application tiles fine and it shows synced and healthy, but when I click into it and try and look at the pods it thows the error and I can't see any further pod details.
Logs of the application controller look like this (masked some details)
time="2024-07-22T07:49:00Z" level=info msg="Loading TLS configuration from secret xxxxx/argocd-server-tls"
time="2024-07-22T07:49:00Z" level=warning msg="Failed to save cluster info: NOAUTH Authentication required."
time="2024-07-22T07:49:04Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=xxx
time="2024-07-22T07:49:04Z" level=warning msg="Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation" application=xxxxxxx dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" fields.level=0
time="2024-07-22T07:49:04Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: development)" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="GetRepoObjs stats" application=xxxxxxx build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=54 unmarshal_ms=54 version_ms=0
time="2024-07-22T07:49:04Z" level=error msg="DiffFromCache error: error getting managed resources for app xxxx: NOAUTH Authentication required."
time="2024-07-22T07:49:04Z" level=error msg="Failed to cache app resources: error setting app resource tree: NOAUTH Authentication required." application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=13 fields.level=0 git_ms=54 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2024-07-22T07:49:04Z" level=info msg="Skipping auto-sync: application status is Synced" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="Update successful" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="Reconciliation completed" application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=13 fields.level=0 git_ms=54 health_ms=0 live_ms=0 patch_ms=33 setop_ms=0 settings_ms=0 sync_ms=0 time_ms=132
time="2024-07-22T07:49:12Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=xxxxxxx
time="2024-07-22T07:49:12Z" level=warning msg="Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation" application=xxxxxxx dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" fields.level=0
time="2024-07-22T07:49:12Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: development)" application=xxxxxxx
time="2024-07-22T07:49:12Z" level=info msg="GetRepoObjs stats" application=xxxxxxx build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=58 unmarshal_ms=58 version_ms=0
time="2024-07-22T07:49:12Z" level=error msg="DiffFromCache error: error getting managed resources for app xxx: NOAUTH Authentication required."
time="2024-07-22T07:49:12Z" level=error msg="Failed to cache app resources: error setting app resource tree: NOAUTH Authentication required." application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=15 fields.level=0 git_ms=59 health_ms=1 live_ms=0 settings_ms=0 sync_ms=0
time="2024-07-22T07:49:12Z" level=info msg="Skipping auto-sync: application status is Synced" application=xxxxxxx
time="2024-07-22T07:49:13Z" level=info msg="Update successful" application=xxxxxxx
Strange. 1.12.4 has the redis authentication change so it shouldn't cause upgrade issues. Is it happening only on one cluster or seeing similar behavior on others aswell? Can you also check if redis-initial-password secret is present and not empty in the ArgoCD instance namespace and REDIS_PASSWORD env is referencing this secret in repo-server, application-controller and argocd-server deployments correctly.
I see an argocd-redis-initial-passord secret but the 2 data values are admin.password and immutable and both values are set, I don't have a REDIS_PASSWORD data value in it
Oh sorry, I should have framed it better😅. Check the REDIS_PASSWORD env var in repo-server, application-controller and argocd-server deployments, not in the redis secret. The secret contains only 2 values.
Thanks
repo-serverdeployment looks good has aREDIS_PASSWORDand it point to the secretapplication-controllerstatefulset - has NOREDIS_PASSWORDenvironment variableargocd-serverdeployment has NOREDIS_PASSWORDenvironment variable either.
Would it be OK to just edit the statefuleset and deployment to add this in if they are missing?
Yes, we can try that. But the operator should have handle this automatically. Could be bug...
Sorry for the delay in getting back, I updated the deployments and statefulset with the REDIS_PASSWORD environment variables and all is back working again, I can get the resources of apps again. Thanks a lot for your help
Feel free to close this, or if you want to keep it open to investigate a bug work away.
Great. I will keep this issue open until the bug is triaged. Thanks for reporting it.
@garyd2 - Just to rule out the possibility of broken operator reconciliation, can you confirm if there are any unusual error messages in the operator manager pod logs?
I have reviewed the openshift-gitops-operator-controller-manager logs and no errors are thown.
This seems to be happening between 1.13 and 1.14. Did an upgrade path all the way from 1.8 to 1.14. And with 1.14 it failed.
Also, for others that comes here to read about the same issue. Whenever you start a fresh redis-ha SS, it will take some time for it to initialize a new Redis cluster. Bitten by this a couple of times now.