percona-server-mongodb-operator icon indicating copy to clipboard operation
percona-server-mongodb-operator copied to clipboard

K8SPSMDB-1014: update cert-manager certs and issuers

Open pooknull opened this issue 2 years ago • 1 comments

K8SPSMDB-1014 Powered by Pull Request Badge

https://jira.percona.com/browse/K8SPSMDB-1014

DESCRIPTION

Problem: After the update from crVersion 1.14.0 to 1.15.0, after certificate renewal, the operator is stuck failing when .spec.updateStrategy is set to SmartUpdate.

When updateStrategy is set to SmartUpdate and the cluster is updated from version 1.14.0 to 1.15.0, after the certificate renewal cluster is stuck on smart update.

Cause: In version 1.15.0 we switched to the new certificate schema. For more info check the description of this PR: https://github.com/percona/percona-server-mongodb-operator/pull/1287. In this PR we didn't implement the update to the new certificate schema.

Certificates are not updated and we will still have the same problem we had in https://jira.percona.com/browse/K8SPSMDB-956.

Solution: First of all, the operator should update the certificates. To do that, we should check if the cert-manager is installed. If it is, we should try to apply our changes.

After the changes, the operator will still face issues with smartUpdate, so it is recommended to create a migration mechanism as described in this guide if there are any changes made to the CA.: https://docs.percona.com/percona-operator-for-mongodb/TLS.html#update-certificates-without-downtime.

So, the migration will consist of the following actions:

  1. Check if the cert-manager exists.
  2. If true, check if any changes will be applied to the certificates.
  3. If true, then we should create copies of cluster1-ssl and cluster1-ssl-internal secrets named cluster1-ssl-old and cluster1-ssl-internal-old.
  4. Apply the changes to the certificates and wait for new secrets.
  5. Get ca.crt from both old secrets and merge them into new secrets. Set values of tls.key and tls.crt from old secrets to the new ones.
  6. Wait until the next reconcile.
  7. On the next reconcile, we will check if any changes will be applied to the certificates.
  8. If certificates remain untouched, the operator will check if ca.crt was merged from old secrets.
  9. If true, it will delete old secrets.
  10. Wait until all statefulsets are ready.
  11. Compare the ca.crt of current secrets with the ca.crt from cluster1-ca-cert
  12. If it's different, recreate the secrets by deleting them. Cert-manager will recreate them.

CHECKLIST

Jira

  • [x] Is the Jira ticket created and referenced properly?
  • [x] Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • [x] Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • [x] Is an E2E test/test case added for the new feature/change?
  • [x] Are unit tests added where appropriate?
  • [x] Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • [x] Are all needed new/changed options added to default YAML files?
  • [x] Are the manifests (crd/bundle) regenerated if needed?
  • [x] Did we add proper logging messages for operator actions?
  • [x] Did we ensure compatibility with the previous version or cluster upgrade process?
  • [x] Does the change support oldest and newest supported MongoDB version?
  • [x] Does the change support oldest and newest supported Kubernetes version?

pooknull avatar Nov 27 '23 22:11 pooknull

It seems that the https://docs.percona.com/percona-operator-for-mongodb/TLS.html#update-certificates-without-downtime approach doesn't work with mongos.

After the final recreation of secrets (step 12), the operator updates the cfg pods with new secrets. After all cfg pods have been updated, all mongos pods become unready with the following error in the logs:

{"t":{"$date":"2024-04-22T07:35:34.003+00:00"},"s":"W","c":"NETWORK","id":23235,"ctx":"conn2449","msg":"SSL peer certificate validation failed","attr":{"reason":"self-signed certificate"}}

This is why I removed lines in this discussion: https://github.com/percona/percona-server-mongodb-operator/pull/1383#discussion_r1567191104. We shouldn't remove them. But we also need to find a way to update mongos correctly.

My guess is that mongos only accepts the first part of the CA.

pooknull avatar Apr 22 '24 07:04 pooknull

The issue mentioned here: https://github.com/percona/percona-server-mongodb-operator/pull/1383#issuecomment-2068705855 has been fixed in https://github.com/percona/percona-server-mongodb-operator/pull/1383/commits/82643909bf55fb76717a939fed4a229fc6811aac

Description has been updated.

pooknull avatar Apr 22 '24 13:04 pooknull

Test name Status
arbiter passed
balancer passed
custom-replset-name passed
cross-site-sharded passed
data-at-rest-encryption passed
data-sharded passed
demand-backup passed
demand-backup-eks-credentials passed
demand-backup-physical passed
demand-backup-physical-sharded passed
demand-backup-sharded passed
expose-sharded passed
ignore-labels-annotations passed
init-deploy passed
finalizer passed
ldap passed
ldap-tls passed
limits passed
liveness passed
mongod-major-upgrade passed
mongod-major-upgrade-sharded passed
monitoring-2-0 passed
multi-cluster-service passed
non-voting passed
one-pod passed
operator-self-healing-chaos passed
pitr passed
pitr-sharded passed
pitr-physical passed
pvc-resize passed
recover-no-primary passed
rs-shard-migration passed
scaling passed
scheduled-backup passed
security-context passed
self-healing-chaos passed
service-per-pod passed
serviceless-external-nodes passed
smart-update passed
split-horizon passed
storage passed
tls-issue-cert-manager passed
upgrade passed
upgrade-consistency passed
upgrade-consistency-sharded-tls passed
upgrade-sharded passed
users passed
version-service passed
We run 48 out of 48

commit: https://github.com/percona/percona-server-mongodb-operator/pull/1383/commits/f7f2d8d27a1e472eec79713e30a4aef45907395a image: perconalab/percona-server-mongodb-operator:PR-1383-f7f2d8d2

JNKPercona avatar Apr 23 '24 08:04 JNKPercona