operator-lifecycle-manager icon indicating copy to clipboard operation
operator-lifecycle-manager copied to clipboard

[OCPBUGS-25341]: perform operator apiService certificate validity checks directly

Open ankitathomas opened this issue 1 year ago • 1 comments

Cert updates can occasionally fail silently, updating only the timestamps on the CSV without any changes to the underlying cert secret. This PR uses the cert expiry times directly to retry the refresh.

ankitathomas avatar May 03 '24 19:05 ankitathomas

* re-reconcile if the cert secret changes

We already do this for all olm managed secrets

* re-reconcile at a time in the future base on the current secrets' expiration time.

With each reconcile, we check all certs for anything that expires in a day or less and rotate all of those, including the ones that are already expired. The problem was that we were checking the cert freshness timestamps on the CSV to make those checks, and those were sometimes being incorrectly updated when the cert rotate hadn't really succeeded.

ankitathomas avatar Jun 13 '24 15:06 ankitathomas

With each reconcile, we check all certs for anything that expires in a day or less and rotate all of those, including the ones that are already expired.

But I'm wondering if there's a scenario where:

  1. I install an operator that needs a cert
  2. OLM creates the secrets, sets expiration to now + N
  3. Everything reconciles, steady state is achieved, no further changes are made to the CSV.
  4. N time passes, and the certs expire. Still no changes, so still no cert renewal?

Is there something that forces a re-reconcile inside the time window where:

  • The certs are not yet expired, but
  • They are close enough to expiration that we'll rotate them.

joelanford avatar Jun 13 '24 17:06 joelanford