enmasse
enmasse copied to clipboard
Operator fails to renew certificates properly
I have an issue, after running shared infrastructure for a while, that the operator seems to fail to renew the certificates:
{"level":"info","ts":1595571780.565553,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:00Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571785.5656567,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571785.5828426,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:05Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571790.5829506,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571790.6026292,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:10Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571795.6027315,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571795.620408,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:15Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571800.62056,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571800.6436908,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:20Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571805.6438673,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571805.6631546,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:25Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571810.6634145,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571810.6888678,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:30Z is after 2020-07-23T10:07:55Z"}
{
It seems that this prevents the operator to properly delete messaging endpoints as well.
It looks like as if this is caused by the fact that the CA is only renewed when the reconcile loop is run. However, that is not timer based by default.
I guess possible solutions to this could be:
- Return
requeueAfterfor objects that have a CA. So on startup they would be reconciled anyway, and then when the CA is close to expiry, unless something else happens - Create a separate timer event for objects that have a CA
I guess we have the same problem with the IoT bits, as they rely on the same pattern now.