openshift-acme icon indicating copy to clipboard operation
openshift-acme copied to clipboard

certs not updating. leader-election blocked by lock?

Open jkassis opened this issue 2 years ago • 5 comments

What happened:

  1. Previously working / updating Certificates not updating.
  2. Two instances of Openshift ACME running. image
  3. One instance reporting this... I0103 00:24:46.571147 1 leaderelection.go:352] lock is held by openshift-acme-7f65979ff9-hgsz4_8f58d3f6-9cf7-4745-af7b-476b0505caa9 and has not yet expired I0103 00:24:46.571381 1 leaderelection.go:247] failed to acquire lease fg/acme-controller-locks
  4. Other instance reporting this...
I0103 00:24:16.217493       1 reflector.go:432] k8s.io/[email protected]/tools/cache/reflector.go:108: Watch close - *v1.Route total 0 items received
I0103 00:25:04.539294       1 reflector.go:432] k8s.io/[email protected]/tools/cache/reflector.go:108: Watch close - *v1.LimitRange total 0 items received
I0103 00:25:15.362207       1 reflector.go:432] k8s.io/[email protected]/tools/cache/reflector.go:108: Watch close - *v1.ReplicaSet total 0 items received
I0103 00:26:15.924614       1 reflector.go:432] k8s.io/[email protected]/tools/cache/reflector.go:108: Watch close - *v1.Service total 0 items received
I0103 00:27:04.606876       1 reflector.go:432] k8s.io/[email protected]/tools/cache/reflector.go:108: Watch close - *v1.ConfigMap total 2054 items received
I0103 00:27:30.959775       1 reflector.go:432] k8s.io/[email protected]/tools/cache/reflector.go:108: Watch close - *v1.LimitRange total 0 items received
I0103 00:27:55.497750       1 reflector.go:432] k8s.io/[email protected]/tools/cache/reflector.go:108: Watch close - *v1.Secret total 9 items received

What you expected to happen: Clean logs and certificates up to date.

How to reproduce it (as minimally and precisely as possible): Not sure.

Anything else we need to know?:

Environment:

  • OpenShift/Kubernetes version (use oc/kubectl version): OKD 4.7.0

image

  • Others:

@tnozicka

jkassis avatar Jan 03 '22 00:01 jkassis

seeing this when loading the cert...

[I] jkassis@Jeremys-MBP ~ [124]> ws "wss://pubsub.shinetribe.media/connPut?ConnUUID=b3f0b2d8-f5f8-452c-83fc-c476ecb7a3df"                               01.02 16:36
x509: certificate has expired or is not yet valid: current time 2022-01-02T16:36:11-08:00 is after 2022-01-02T01:42:28Z
[I] jkassis@Jeremys-MBP ~ [1]>                                                                                                                          01.02 16:36

jkassis avatar Jan 03 '22 00:01 jkassis

brought the pods down and the "leader election blocked" logs reappear. proceeding as if this is normal. looking at the certificate status, it appears that the cert is up for re-issue on 02-01, which seems odd given that the fetched cert has already expired.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  creationTimestamp: '2021-10-04T02:24:53Z'
  generation: 3
  managedFields:
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          .: {}
          'f:commonName': {}
          'f:dnsNames': {}
          'f:issuerRef':
            .: {}
            'f:kind': {}
            'f:name': {}
          'f:secretName': {}
      manager: Mozilla
      operation: Update
      time: '2021-10-04T02:38:51Z'
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          'f:privateKey': {}
        'f:status':
          .: {}
          'f:conditions': {}
          'f:notAfter': {}
          'f:notBefore': {}
          'f:renewalTime': {}
          'f:revision': {}
      manager: controller
      operation: Update
      time: '2021-12-03T01:42:28Z'
  name: pubsub-shinetribe-media
  namespace: fg
  resourceVersion: '307455716'
  selfLink: /apis/cert-manager.io/v1/namespaces/fg/certificates/pubsub-shinetribe-media
  uid: a528dc92-636c-40c8-862e-38dfa6986cc7
spec:
  commonName: pubsub.shinetribe.media
  dnsNames:
    - pubsub.shinetribe.media
  issuerRef:
    kind: Issuer
    name: le-wildcard-issuer
  secretName: cert-pubsub-shinetribe-media
status:
  conditions:
    - lastTransitionTime: '2021-10-04T02:42:30Z'
      message: Certificate is up to date and has not expired
      observedGeneration: 3
      reason: Ready
      status: 'True'
      type: Ready
  notAfter: '2022-03-03T00:44:07Z'
  notBefore: '2021-12-03T00:44:08Z'
  renewalTime: '2022-02-01T00:44:07Z'
  revision: 3

jkassis avatar Jan 03 '22 00:01 jkassis

Seems like the algo that determines the renewal time is broken?!? Here's what my browser gets for that cert... roughly 1D off.

image

jkassis avatar Jan 03 '22 00:01 jkassis

I believe problem has been there all along. Forced to delete the Pods once in a while to ensure renewal process gets triggered.

tux-o-matic avatar Jan 27 '22 07:01 tux-o-matic

encountering this issue as well. have tried force deleting the pods and bringing running pods down to 0 and bringing it back up but lock still held by some ghost

brianorwhatever avatar Jan 29 '22 00:01 brianorwhatever