AtlasDatabaseUser - message - unable to list: test because of unknown namespace for the cache
Have a version of operator 1.7.1 and decide to upgrade to the latest in cluster. Create local env
- k8s - by docker desktop v 1.25.4
- operator v1.7.1
- Add AtlasDeployment and AtlasDatabaseUser
- Upgrade to v2.2.0 ( helm upgrade crd then upgrade operator )
- Fix AtlasDeployment
- Check logs of operator get error aka
{"level":"INFO","time":"2024-04-16T12:12:14.543Z","msg":"Status update","atlasdatabaseuser":"test/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: test because of unknown namespace for the cache"}}
What did you expect? After all step operator just should work as expected
What happened instead? AtlasDatabaseUser status always in False state
Operator Information
- 1.7.1 -> 2.2.0
Kubernetes Cluster Information
- Docker Desktop
- 1.25.4
Additional context Try to figure out why AtlasDatabaseUser CRD failed. It's created proper secrets and creates users in AtlasUI but CRD itself always in Ready - False state
status: conditions: - lastTransitionTime: "2024-04-16T12:03:17Z" status: "False" type: Ready - lastTransitionTime: "2024-04-16T11:44:08Z" status: "True" type: ResourceVersionIsValid - lastTransitionTime: "2024-04-16T11:44:08Z" status: "True" type: ValidationSucceeded - lastTransitionTime: "2024-04-16T12:03:18Z" message: 'unable to list: test because of unknown namespace for the cache' reason: DatabaseUserStaleConnectionSecrets status: "False" type: DatabaseUserReady
If possible, please include:
{"level":"DEBUG","time":"2024-04-16T12:17:12.709Z","msg":"Ensured connection Secret up-to-date","atlasdatabaseuser":"test/operator-upgrade-test","secretname":"HIDDEN"} {"level":"INFO","time":"2024-04-16T12:17:12.709Z","msg":"Status update","atlasdatabaseuser":"test/operator-upgrade-test-","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: test because of unknown namespace for the cache"}}
Thanks for reporting this issue @qtranton !
Could you give us a minimum YAML sample we could use to reproduce the issue? Does not need to be your original complete setup, just the definitions that reproduce the same failure.
Sure, i have cleanup i guess my yaml here
apiVersion: v1
kind: Secret
metadata:
labels:
app: operator-upgrade
atlas.mongodb.com/type: credentials
env: dev
name: operator-upgrade-test
namespace: test
stringData:
password: testpassword
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasBackupPolicy
metadata:
name: operator-upgrade-test
namespace: test
annotations:
mongodb.com/atlas-resource-policy: "keep"
spec:
items:
- frequencyInterval: 12
frequencyType: hourly
retentionUnit: days
retentionValue: 1
- frequencyInterval: 1
frequencyType: daily
retentionUnit: days
retentionValue: 7
- frequencyInterval: 6
frequencyType: weekly
retentionUnit: weeks
retentionValue: 1
- frequencyInterval: 40
frequencyType: monthly
retentionUnit: months
retentionValue: 1
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasBackupSchedule
metadata:
name: operator-upgrade-test
namespace: test
annotations:
mongodb.com/atlas-resource-policy: "keep"
spec:
autoExportEnabled: false
referenceHourOfDay: 21
referenceMinuteOfHour: 2
policy:
name: operator-upgrade-test
namespace: test
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
metadata:
name: operator-upgrade-test
labels:
app: "operator-upgrade"
env: dev
# mongodb.com/atlas-resource-policy: "keep"
spec:
roles:
- roleName: readWrite
databaseName: Application
scopes:
- type: CLUSTER
name: operator-upgrade-test
projectRef:
name: project-name
namespace: mongodb-operator
username: operator-upgrade-test
databaseName: admin
passwordSecretRef:
name: "operator-upgrade-test"
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasDeployment
metadata:
name: operator-upgrade-test
namespace: test
labels:
app: "operator-upgrade"
env: dev
# annotations:
# mongodb.com/atlas-resource-policy: "keep"
spec:
backupRef:
name: operator-upgrade-test
namespace: test
projectRef:
name: project-name
namespace: mongodb-operator
advancedDeploymentSpec:
mongoDBMajorVersion: "6.0"
clusterType: REPLICASET
backupEnabled: true
pitEnabled: false
name: operator-upgrade-test
replicationSpecs:
- regionConfigs:
- electableSpecs:
instanceSize: M10
nodeCount: 3
providerName: GCP
backingProviderName: GCP
regionName: "EASTERN_US"
# Priority description https://www.mongodb.com/docs/atlas/reference/atlas-operator/atlasdeployment-custom-resource/#mongodb-setting-spec.advancedDeploymentSpec.replicationSpecs.regionConfigs.priority
priority: 7
autoScaling:
compute:
enabled: false
cc @roothorp
@qtranton can you check if you happen to have the WATCH_NAMESPACE environment variable set for your operator deployment? i.e. if you could submit the output of kubectl -n <operator_namespace> get pod <operator_name> here?
i.e. it looks like the test namespace is not being listened by the operator, overriden by the WATCH_NAMESPACE env variable.
In helm i see this
{{- if .Values.watchNamespaces }}
- name: WATCH_NAMESPACE
value: "{{ join "," .Values.watchNamespaces }}"
{{- end }}
So i have check pod and
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
OPERATOR_POD_NAME: mongodb-atlas-operator-5df9ff6978-tqznx (v1:metadata.name)
OPERATOR_NAMESPACE: mongodb-operator (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4kgq7 (to)
In roles i also see some mention of this variable, but since it empty no additional roles was created
mongodb-operator mongodb-atlas-operator
mongodb-operator mongodb-atlas-operator-leader-election-role
Plus it works on older version so older version could read secrets i guess
Validate secrets as well When remove labels
atlas.mongodb.com/type: credentials
Get error like
"msg":"Status update","atlasdatabaseuser":"tester/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"InternalError","message":"Secret \"operator-upgrade-test\" not found"}}
Back labels in place get error
"msg":"Status update","atlasdatabaseuser":"tester/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: tester because of unknown namespace for the cache"}}
@josvazg @s-urbaniak hey have some time to debug issue, so on my local cluster for some reason on version 2.2.2 i do not see status.name parameters. Just put a lot of println in local branch :D
#############################
operator-upgrade-test
cleanupStaleSecrets: Failed to list connection Secrets
#############################
To
if user.Status.UserName != user.Spec.Username {
// Note, that we pass the username from the status, not from the spec
fmt.Println("#############################")
fmt.Println(user.Status.UserName, user.Spec.Username)
fmt.Println("cleanupStaleSecrets: Failed to list connection Secrets")
fmt.Println("#############################")
return RemoveStaleSecretsByUserName(ctx.Context, k8sClient, projectID, user.Status.UserName, user, ctx.Log)
}
Here https://github.com/mongodb/mongodb-atlas-kubernetes/blob/main/pkg/controller/connectionsecret/connectionsecrets.go#L126 Now i try figure out why i have error related to secret if user not set Meanwhile CRD look like that :
status:
conditions:
- lastTransitionTime: "2024-05-27T11:30:38Z"
status: "False"
type: Ready
- lastTransitionTime: "2024-05-27T11:30:38Z"
status: "True"
type: ResourceVersionIsValid
- lastTransitionTime: "2024-05-27T11:30:38Z"
status: "True"
type: ValidationSucceeded
- lastTransitionTime: "2024-05-27T11:30:39Z"
message: 'unable to list: tester because of unknown namespace for the cache'
reason: DatabaseUserStaleConnectionSecrets
status: "False"
type: DatabaseUserReady
observedGeneration: 1
passwordVersion: "3017702"
Update: Recheck on v 1.7 and name in status appear
Thanks for your reports. I managed to reproduce the same. I am debugging it now.
Seems we found the issue, we are working on a fix.
In the meantime, you could pass the list of namespaces you want to get checked. ie:
helm install ... --set watchNamespaces=test,...
@josvazg on local machine yeah, but for main cluster we have too much namespace :) i will wait, not so critical
BTW this #1619 already fixes the issue but it includes unrelated refactors. I am working on a specific test to cover this bug which was not previously detected by our test suite.
I will check build locally then :)
@josvazg jfyi
{"level":"ERROR","time":"2024-05-30T14:00:05.322Z","msg":"LeaderElectionID must be configuredunable to start operator"}
Get this error now
@josvazg jfyi
{"level":"ERROR","time":"2024-05-30T14:00:05.322Z","msg":"LeaderElectionID must be configuredunable to start operator"}Get this error now
I do not think this is related. BTW this PR #1621 should fix the original issue.
As for this new error, do you have a sample to reproduce it?
@josvazg just build and put docker container to helm chart 2.2.2 nothing change from in deployment
@josvazg After few additional crd ( not in upstream yet :D ) user status becomes true. We will do some additional tests according to our infra. Maybe you know when will it be released?
@josvazg After few additional crd ( not in upstream yet :D ) user status becomes true. We will do some additional tests according to our infra. Maybe you know when will it be released?
We are aiming for a release soon, maybe this week. I should be merging PR #1621 tomorrow