Defined secrets should be treated as existing secrets
Report
I am working on deploying a MongoDB cluster using Helm, with the intention of managing it via ArgoCD. For handling user secrets, I use the External Secrets Operator (ESO). Since ESO fetches secrets from an external provider and then creates the corresponding Kubernetes Secret, there's an inherent delay in secret availability.
More about the problem
I'm encountering a race condition where PSMDB attempts to verify the presence of the user secret. If the secret is not yet available, it proceeds to create a new one. I attempted to mitigate this using Helm hooks, but in most cases, PSMDB still acts faster than ESO.
Initially, I considered modifying the operator code for users secret. However, I noticed that CheckNSetDefaults is invoked before reconcileUsersSecret. This function populates fields with default values, rendering my condition check ineffective.
Notes related to users secret:
Even though the operator eventually restarts the backup-agent container upon detecting a change in the secret, we're still facing timing issues related to Kubernetes secret synchronization. In some cases, despite the container restart, the updated secret value is not properly propagated, leading to continuous PBM authentication errors.
Notes related to SSL certificates: I realized that a similar race condition occurs when specifying the ssl and sslInternal fields. If cert-manager is slower than PSMDB, the operator may attempt to create an issuer, even though the referenced certificates have already been issued.
Notes related to internal key: In most cases, I’ve noticed that the MongoDB internal key is still created by the operator, even when I’ve explicitly referenced it (race condition here as well).
Summary:
I ended up changing the behavior of the spec.secrets fields. If any secret is explicitly defined, it is now treated as an externally managed, pre-existing secret. The operator will no longer attempt to create a Kubernetes Secret with that name. With this change, I’ve observed that cluster creation is faster, as there's no need to update the internal secret or propagate changes to the MongoDB nodes - which sometimes didn't work properly with the original logic causing authentication errors.
Steps to reproduce
- Create all the secrets with External Secrets Operator
- Create cert-manager.io/Certificate
- Fill
spec.secretswith the name of the objects created - Read the operator logs and later MongoDB nodes
Versions
- Kubernetes: 1.30
- Operator: 1.20.1
- Database: v7
Anything else?
Related: https://forums.percona.com/t/mongodb-operator-creates-overwrites-external-secret-by-it-self/14794/5
I understand this represents a breaking change, which is why I'm reaching out to the community for guidance. What would be the recommended and reliable approach to handle this scenario?
Hello @alex1989hu
Did you try to configure this: https://github.com/percona/percona-server-mongodb-operator/blob/main/deploy/cr.yaml#L52-L60 with the secrets that are created by ESO?
Hey @gkech , yes I did. Please see the the issue description where I described what happens and how I tried to mitigate https://github.com/percona/percona-server-mongodb-operator/issues/1960#issue-3122660847
This is a very breaking change that I am not comfortable with. I think the request is valid, so we should consider this as a feature request.
@gkech could you please play with external secrets operator and see how it behaves? then we can create a jira ticket.
Dear all,
As you can see, the github-actions bot automatically closed the pull request. I’m wondering what the possible next steps are.