enmasse icon indicating copy to clipboard operation
enmasse copied to clipboard

Operator fails to renew certificates properly

Open ctron opened this issue 5 years ago • 2 comments

I have an issue, after running shared infrastructure for a while, that the operator seems to fail to renew the certificates:

{"level":"info","ts":1595571780.565553,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:00Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571785.5656567,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571785.5828426,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:05Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571790.5829506,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571790.6026292,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:10Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571795.6027315,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571795.620408,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:15Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571800.62056,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571800.6436908,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:20Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571805.6438673,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571805.6631546,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:25Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571810.6634145,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571810.6888678,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:30Z is after 2020-07-23T10:07:55Z"}
{

It seems that this prevents the operator to properly delete messaging endpoints as well.

ctron avatar Jul 24 '20 06:07 ctron

It looks like as if this is caused by the fact that the CA is only renewed when the reconcile loop is run. However, that is not timer based by default.

I guess possible solutions to this could be:

  • Return requeueAfter for objects that have a CA. So on startup they would be reconciled anyway, and then when the CA is close to expiry, unless something else happens
  • Create a separate timer event for objects that have a CA

ctron avatar Jul 24 '20 06:07 ctron

I guess we have the same problem with the IoT bits, as they rely on the same pattern now.

ctron avatar Jul 24 '20 06:07 ctron