kube-prometheus icon indicating copy to clipboard operation
kube-prometheus copied to clipboard

KubeClientCertificateExpiration always alert

Open ne1000 opened this issue 5 years ago • 47 comments

What did you do? wget https://codeload.github.com/coreos/prometheus-operator/tar.gz/v0.23.1 install prometheus-operator use kubectl create -f prometheus-operator-0.23.1/contrib/kube-prometheus/manifests/ || true

What did you expect to see? all components work correctly.

Environment

  • K8s version: v1.11.0

  • Prometheus Operator version: v0.23.1

  • Kubernetes cluster kind:

          install k8s cluster use binary package  https://storage.googleapis.com/kubernetes-release/release/v1.11.0/kubernetes.tar.gz
    
  • Manifests:


[2] Firing
--
Labels
alertname = KubeClientCertificateExpiration
job = apiserver
prometheus = monitoring/k8s
severity = critical
Annotations
message = Kubernetes API certificate is expiring in less than 1 day.runbook_url = https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpirationSource

Labels
alertname = KubeClientCertificateExpiration
job = apiserver
prometheus = monitoring/k8s
severity = warning
Annotations
message = Kubernetes API certificate is expiring in less than 7 days.runbook_url = https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration


I used cfssl generate pem and keys

# openssl x509 -in /etc/kubernetes/ssl/ca.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            6f:b9:70:eb:80:73:e6:73:f9:c8:29:98:99:5e:b5:f2:6d:a3:0e:49
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Validity
            Not Before: Aug  8 09:54:00 2018 GMT
            Not After : Aug  7 09:54:00 2023 GMT
        Subject: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:c8:ae:16:d6:0c:5b:30:95:97:a2:5b:16:cf:db:
                    f1:bd:68:8c:c6:0c:84:5b:a4:46:b4:79:0b:2b:c4:
                    b2:c0:5f:ab:e4:4a:33:46:d3:82:a3:33:bf:a7:f7:
                    ec:a3:4e:b3:70:34:e8:15:24:8e:56:b7:4d:68:9b:
                    e0:dc:0a:3a:3c:36:3e:f2:5c:be:d1:5d:fa:fa:e0:
                    7d:5b:2a:5d:e2:fc:94:9f:ea:a9:ce:ca:ad:2f:fd:
                    16:bc:fb:83:f6:45:fd:2f:9a:ac:94:e3:fd:49:90:
                    a1:31:95:cd:f2:30:2b:cd:31:34:69:b1:3a:b8:6a:
                    b8:7a:ef:f1:e9:ee:a2:5d:81:a8:59:80:77:c1:43:
                    85:3c:29:d8:02:fb:24:b9:9a:1f:e4:61:82:ec:8d:
                    49:3d:91:f7:0a:50:25:b1:a4:51:ba:f3:d6:77:07:
                    e2:50:ed:b8:af:30:18:d8:23:d6:e9:17:b1:a0:1c:
                    8c:74:f3:87:56:08:c7:49:86:c0:90:5e:16:a4:1e:
                    07:49:ef:b2:dc:9e:22:4c:b9:9b:7f:38:47:d7:26:
                    17:15:92:79:51:cc:a9:3f:4b:a1:6d:03:94:5b:9c:
                    03:c0:19:7e:d1:4e:c9:77:84:b1:e4:5b:a6:2b:54:
                    95:d0:a3:ef:39:d6:c3:88:77:af:4f:31:cd:ba:f7:
                    cc:3b
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
            X509v3 Basic Constraints: critical
                CA:TRUE, pathlen:2
            X509v3 Subject Key Identifier: 
                BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F
            X509v3 Authority Key Identifier: 
                keyid:BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F

    Signature Algorithm: sha256WithRSAEncryption
         78:b7:65:4d:53:e1:0c:7d:d6:9e:d5:aa:f8:1a:34:e4:1d:c0:
         22:4b:42:72:86:86:e9:73:e2:fd:89:90:e1:10:56:a7:f2:15:
         71:14:79:ce:67:9a:ca:5d:4d:e8:25:3d:70:2a:0a:3b:08:09:
         02:8a:d9:2d:ed:85:cd:10:38:60:75:d7:f5:a7:b2:ee:86:05:
         dd:50:38:04:a4:7a:bc:f5:02:b2:a5:d9:a2:a1:71:7d:e5:ce:
         dd:c8:5a:a7:25:61:de:c3:76:c3:87:3e:5a:4c:eb:36:91:51:
         8b:fc:ef:9d:aa:35:58:3a:ba:fc:2a:3c:4f:b3:54:e8:0d:a5:
         32:25:91:dd:93:75:33:53:2b:94:9e:f1:cb:e9:58:17:a6:dc:
         07:1c:96:5e:93:40:d6:c8:2b:67:49:3b:3f:1f:a8:3a:41:65:
         29:03:f3:18:f9:d3:66:a8:49:14:1e:7f:cb:6b:f6:26:1d:7b:
         6f:46:c6:27:a1:69:fe:62:7f:da:fb:41:7d:fc:ab:12:77:b8:
         b3:4c:92:a5:5c:d2:8c:25:a1:aa:1e:2f:a2:de:38:e5:9a:96:
         2f:b2:bb:3c:32:de:db:7f:80:eb:f0:01:be:2d:ff:00:09:35:
         ea:2b:8d:33:6e:6c:2c:6d:37:a2:c4:b3:c9:eb:ac:3f:ec:e5:
         5d:61:50:66

# openssl x509 -in /etc/kubernetes/ssl/kubernetes.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            76:64:c7:59:95:aa:fb:9b:8c:b2:26:c0:82:24:c5:0a:8d:95:a2:1e
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Validity
            Not Before: Aug  8 09:54:00 2018 GMT
            Not After : Aug  5 09:54:00 2028 GMT
        Subject: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:b9:7e:1b:a9:9a:95:21:42:5a:e8:3e:79:94:e6:
                    c1:35:87:93:22:3d:3c:c9:65:be:b6:99:4b:47:25:
                    1a:22:db:4a:a5:b8:59:0d:2d:a0:0d:e5:c6:35:3b:
                    8e:2c:e3:fe:3a:d9:bc:63:9b:a0:98:c2:26:98:4c:
                    be:8b:71:20:37:a3:19:21:34:03:0b:10:d7:cb:7c:
                    b6:d8:68:90:1b:e1:6b:ee:b8:0e:6f:3d:33:2b:3f:
                    87:9a:4f:6c:59:08:f4:22:a6:2a:b6:d5:d6:00:b8:
                    7e:3c:90:aa:99:5c:6e:7c:93:f2:6b:6a:6f:5b:c6:
                    35:60:e0:14:62:5e:91:cc:20:eb:88:ea:cc:7a:10:
                    d7:f1:5f:b3:fb:aa:c4:a7:f5:95:3e:8a:44:ee:09:
                    12:6b:aa:29:05:40:df:1e:54:25:05:e2:8c:cb:d7:
                    32:e8:c5:ff:0c:48:11:27:c9:52:81:f2:53:b0:82:
                    b0:1b:7f:ad:08:fd:cd:b6:c1:4e:43:da:2d:f0:90:
                    90:cb:97:a2:2a:31:bc:65:2c:9f:a9:72:90:dd:b0:
                    5e:3b:7d:1c:37:d6:ca:22:13:2a:da:27:1d:61:94:
                    8f:36:9f:9d:6a:d1:6c:b9:17:58:5d:9c:0d:b1:d8:
                    2a:98:f1:54:d7:87:c6:da:ff:05:c9:a2:c5:91:5a:
                    77:23
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier: 
                B3:1C:65:F4:DA:61:57:1F:68:06:05:46:36:31:BC:AF:E1:D5:06:7C
            X509v3 Authority Key Identifier: 
                keyid:BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F

            X509v3 Subject Alternative Name: 
                DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.default.svc.cluster.local, IP Address:127.0.0.1, IP Address:192.168.2.93, IP Address:10.100.0.1, IP Address:192.168.2.86, IP Address:192.168.2.87, IP Address:192.168.2.88
    Signature Algorithm: sha256WithRSAEncryption
         2d:a6:ee:28:71:0f:ea:69:ff:90:25:d6:04:4e:4c:e1:3d:ff:
         34:f1:64:67:4f:ab:80:ee:f5:d9:16:53:48:0c:c4:fd:9a:f0:
         09:13:71:b1:ba:52:b0:36:38:6b:51:be:ac:cc:14:30:2b:e7:
         a9:87:00:76:fe:1a:58:72:45:27:0a:59:51:74:65:6a:30:ea:
         37:f3:c9:79:59:f0:09:87:e9:94:99:00:11:d7:20:9c:90:5c:
         de:ee:09:ff:53:07:41:06:4c:91:8d:8a:d1:d5:ff:30:06:3b:
         53:32:4c:dd:70:f0:22:7f:7d:e6:02:f2:eb:a6:fd:5a:de:d6:
         0d:fa:b5:e9:f0:95:5a:79:bb:f9:b5:a5:47:01:13:3f:b0:12:
         c6:35:11:45:2f:6b:f3:71:26:92:8f:34:90:0f:42:d8:2a:12:
         0f:ad:96:1f:60:54:5c:27:f3:0f:c3:4e:f5:ef:58:75:51:7a:
         df:8c:f3:b2:d4:b8:70:99:ff:e3:5a:ee:a9:00:69:84:a3:c2:
         df:7e:9b:55:e1:ab:92:bb:55:8b:54:6c:aa:05:c4:ea:29:8e:
         56:72:15:11:c2:6e:49:72:b5:d7:30:06:7b:c4:a2:0a:82:87:
         19:83:b7:1e:3a:86:02:35:f5:21:e8:e6:bf:5e:51:c0:ec:f0:
         c1:3d:15:35

# openssl x509 -in /etc/kubernetes/ssl/admin.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            30:7e:a9:d4:1c:0a:04:d7:3b:2a:38:7a:b3:ca:25:fb:65:e3:e6:72
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Validity
            Not Before: Aug  8 09:55:00 2018 GMT
            Not After : Aug  5 09:55:00 2028 GMT
        Subject: C=CN, ST=Shanghai, L=Shanghai, O=system:masters, OU=System, CN=admin
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:cb:10:41:82:61:ec:93:e8:4d:bf:3e:2d:88:45:
                    ce:e8:57:ee:c6:90:8c:a2:e7:7b:16:ae:9e:fc:6e:
                    60:25:5c:f4:26:c2:50:c7:b5:1e:d3:91:d8:54:e9:
                    5b:6f:85:0e:0a:56:2c:e8:4d:69:dc:06:1e:94:92:
                    29:b9:7c:6f:cd:bd:25:13:bf:c9:9b:98:dd:81:f2:
                    0e:df:27:17:75:c9:4f:d8:9a:9c:5c:b0:db:9c:ed:
                    bb:a5:1f:c1:df:85:9a:f9:62:6b:a8:7a:96:69:30:
                    93:2f:e9:e3:16:dc:74:5f:4d:68:5d:e3:05:ae:01:
                    bd:60:72:d0:30:7c:3b:01:7a:13:9f:4c:ef:62:f2:
                    6c:47:6a:25:6f:b4:0c:7a:53:db:78:a4:71:00:c8:
                    6c:a7:c6:39:42:cf:da:e0:20:ce:66:02:36:43:13:
                    5a:56:7d:da:77:ad:01:4f:ab:56:54:6d:b9:27:08:
                    4e:d6:95:8b:cd:90:5f:28:c2:63:de:d8:f9:77:4f:
                    6d:35:02:9b:6c:cf:27:43:8a:47:b0:74:7e:25:c5:
                    6c:2d:7a:4b:e1:49:af:e7:28:d1:e0:3b:2a:21:1d:
                    bd:09:80:f7:4f:ee:a9:23:50:8c:65:55:0b:fd:d8:
                    4b:4b:b3:82:cb:2a:9f:33:c7:d3:88:63:91:ca:f9:
                    e1:a7
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier: 
                DA:B4:8B:36:C7:E9:9C:C0:6E:AC:8D:1F:D6:18:93:76:4D:6E:78:1F
            X509v3 Authority Key Identifier: 
                keyid:BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F

    Signature Algorithm: sha256WithRSAEncryption
         2f:69:9c:6f:53:bb:7a:42:e3:4e:8f:b4:17:00:10:90:c3:1c:
         be:68:05:f3:15:6a:aa:0c:53:eb:89:c6:0c:2e:c2:0a:75:14:
         16:09:7e:68:0e:83:5c:c9:79:e0:ab:86:ee:93:d7:de:50:66:
         98:3d:5a:43:e0:7f:dd:dc:8a:b8:83:84:84:d4:0f:a5:c5:a1:
         b2:4a:65:76:15:e7:85:f3:7d:37:ee:e2:50:70:28:85:e8:05:
         05:d1:60:74:40:e2:67:7a:31:32:39:e3:96:e3:5b:fe:5e:eb:
         36:ef:cf:fa:95:37:9c:f1:3a:f5:11:80:e8:80:f9:1c:39:04:
         a0:14:af:e0:e7:ac:ce:6f:ad:4a:f3:e8:24:13:20:72:46:15:
         da:9a:e3:1d:88:c5:3d:93:12:7c:71:d3:77:95:5b:cd:f7:3b:
         b3:33:5d:10:31:7e:d9:ba:0e:ed:c8:61:9a:e7:df:fa:75:f1:
         f4:e5:67:81:be:3b:4a:5d:1e:82:1e:64:f7:16:14:4c:d9:e1:
         09:56:81:f4:64:21:47:79:f2:50:55:bb:e1:28:21:40:22:7d:
         f6:b7:f1:cd:3f:99:e5:96:c9:ee:76:be:03:68:da:7a:94:f5:
         ad:bb:40:66:cc:8c:85:36:91:3d:6a:5e:f6:d8:71:23:9e:f1:
         97:ff:73:ea

my k8s cluster and prometheus seems fine. but KubeClientCertificateExpiration always trigger alert, how do I fix it ?

ne1000 avatar Sep 03 '18 07:09 ne1000

This is actually about the certificates that clients use to communicate with the Kubernetes API. Check the certificates that the kubelets, scheduler(s) and controller-manager(s) use.

brancz avatar Sep 03 '18 10:09 brancz

@brancz Yes, I did checked, but I didn't find something issue, kindly please give me a advise

# cat /etc/kubernetes/controller-manager 
KUBE_CONTROLLER_MANAGER_ARGS="--address=0.0.0.0   \
                              --master=http://192.168.2.86:8080 \
                              --cluster-name=kubernetes \
                              --cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \
                              --cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
                              --service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \
                              --root-ca-file=/etc/kubernetes/ssl/ca.pem \
                              --leader-elect=true \
                              --v=0"

# cat /usr/lib/systemd/system/kubelet.service 
[Unit]
Description=Kubernetes API Server
Documentation=https://kubernetes.io/doc
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig --logtostderr=false --log-dir=/var/log/kubernetes --v=0 --cluster-dns=10.100.0.100 --cluster-domain=cluster.local. --resolv-conf=/etc/resolv.conf --authentication-token-webhook=true --authorization-mode=Webhook
Restart=on-failure

[Install]
WantedBy=multi-user.target

# cat /etc/kubernetes/kubelet.kubeconfig 
apiVersion: v1 
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/ssl/ca.pem
    server: https://192.168.2.93:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: admin
  name: kubernetes
current-context: kubernetes
kind: Config
preferences: {}
users:
- name: admin
  user:
    client-certificate: /etc/kubernetes/ssl/admin.pem
    client-key: /etc/kubernetes/ssl/admin-key.pem
# cat /etc/kubernetes/bootstrap.kubeconfig 
apiVersion: v1
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/ssl/ca.pem
    server: https://192.168.2.93:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubelet-bootstrap
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: kubelet-bootstrap
  user:
    token: d2b9e107b99641a01ff18e952cf9ce85
# openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 2 (0x2)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=izuf68thdbm0n4j5qywd7sz-ca@1533727394
        Validity
            Not Before: Aug  8 11:23:14 2018 GMT
            Not After : Aug  8 11:23:14 2019 GMT
        Subject: CN=izuf68thdbm0n4j5qywd7sz@1533727394
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:d1:51:90:4a:e1:e5:e0:4c:90:f2:ff:ad:31:20:
                    77:34:d0:1a:7f:ab:c8:f5:87:74:10:4b:df:52:6a:
                    77:d7:01:92:ab:7a:14:4a:78:eb:c3:a7:9a:ed:f2:
                    b4:95:a7:dd:b8:40:25:4f:fb:06:d8:36:ef:4c:4b:
                    a9:13:0f:c9:f0:de:8a:f6:9a:17:1c:7c:07:5f:2f:
                    4a:dd:3c:f7:4e:7f:59:78:7b:0f:10:df:77:cc:bb:
                    1b:7f:02:3b:39:66:56:5c:37:3b:db:ec:c8:84:53:
                    46:ed:7e:26:3d:14:56:2d:f4:82:a3:4b:64:ae:8b:
                    3e:9c:56:c7:15:59:97:01:f7:93:6a:35:88:5d:b5:
                    cd:a5:03:02:0f:55:04:aa:77:6a:65:8e:96:2c:ae:
                    a6:7e:03:de:01:95:30:bc:68:21:52:4e:02:f4:c0:
                    ad:8f:6b:71:db:5b:b9:d3:7c:55:93:b1:ce:df:12:
                    be:1a:7e:95:0f:cb:d9:4b:1f:43:28:0b:19:12:f4:
                    5f:b8:53:49:93:b2:ef:37:61:0a:ec:d1:11:10:e2:
                    40:bc:1b:c3:74:e2:83:a8:24:32:0e:e8:0e:6f:5e:
                    f2:44:6a:27:40:4a:c0:f1:4c:98:ab:e3:52:18:2c:
                    fd:80:ff:23:9b:2f:77:8e:3e:20:a1:ee:df:82:24:
                    a1:f3
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name: 
                DNS:izuf68thdbm0n4j5qywd7sz
    Signature Algorithm: sha256WithRSAEncryption
         a1:d6:4a:86:13:8e:36:a3:c2:ff:6a:e3:50:ce:48:97:19:a0:
         d8:94:99:47:53:49:75:6f:27:15:ad:4a:b3:4c:50:5c:79:15:
         d3:f7:55:26:f8:58:d2:77:26:c3:6c:8a:2d:46:58:df:5f:70:
         40:54:4d:0a:7e:16:b2:b9:f6:6b:ce:ae:81:94:3f:88:b9:b3:
         56:e5:1c:55:f1:97:7b:50:66:f3:19:c5:48:55:2d:22:60:6d:
         36:0f:4b:99:ef:53:88:2d:3f:6a:47:2d:54:96:a9:35:2b:71:
         7c:18:86:bc:a2:33:2a:b5:b5:ab:19:3b:85:f5:c8:2a:4c:9c:
         54:71:ca:2b:14:00:a3:02:a3:6a:f8:fb:4f:40:d7:a2:59:18:
         9c:7a:93:2b:8d:39:26:1c:42:b1:62:6e:55:dd:c5:48:fe:45:
         cb:81:d1:bb:8a:86:05:80:9d:32:ec:da:cf:9c:83:fa:9b:f3:
         90:70:38:56:c7:1d:7d:e6:69:91:e2:90:77:db:20:50:43:f6:
         8d:5d:7f:52:e7:eb:fc:9d:8e:75:91:f6:63:b6:b9:96:2a:ef:
         0f:1f:99:13:4a:d6:5d:72:d7:1a:a8:71:0f:b6:21:21:a6:81:
         40:e2:74:f4:89:cd:0e:ae:24:0b:e2:c2:07:69:1c:06:0d:ad:
         3b:3f:5e:a2

ne1000 avatar Sep 06 '18 01:09 ne1000

Can you just check that you've checked every client that appears when you run this query?

max(apiserver_request_count) by(client)

The values don't matter it's just an aggregation so we can see all clients requesting against the API.

brancz avatar Sep 06 '18 08:09 brancz

@brancz

pr

if the values don't matter, how to ignore the alert in my case?

ne1000 avatar Sep 11 '18 08:09 ne1000

You can always just silence an alert in Alertmanager :slightly_smiling_face: nonetheless we should figure out what's up here.

brancz avatar Sep 11 '18 09:09 brancz

Noticed this as well when setting up on EKS.

I BELIEVE that EKS manages the renewal of certificates by itself, so I've removed the rule on my end.

willtrking avatar Sep 19 '18 07:09 willtrking

@brancz Was there any update on this? - we get spammed a lot on our alerts with this false detail

kevtaylor avatar Nov 20 '18 12:11 kevtaylor

You can configure the certificate expiry thresholds https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/c0b31ea63564966021f9e6010090acded475b192/config.libsonnet#L42-L43

If you want to ignore it entirely though you can also just remove the alert, or silence it :slightly_smiling_face: .

brancz avatar Nov 20 '18 13:11 brancz

Hi. Thanks for this answer but...

We are using the helm chart which has this baked in: https://github.com/coreos/prometheus-operator/blob/master/helm/exporter-kubernetes/templates/kubernetes.rules.yaml

How do we influence that?

And if I do want to programmatically silence that alert, is there a way of doing so in this chart - selective alerting ?

kevtaylor avatar Nov 21 '18 11:11 kevtaylor

I'm not aware of tooling to declaratively silence alerts, I agree that would be neat to have.

@gianrubio maintains the helm charts, the coreos/red hat team just maintains the jsonnet part, as we don't use helm. I can't help you with helm things unfortunately.

brancz avatar Nov 21 '18 12:11 brancz

@gianrubio Do you have any helm updates to fix this?

kevtaylor avatar Dec 14 '18 18:12 kevtaylor

Yeah we are also seeing this issue even though all our certificates appear up to date, and it specifically states the apiserver. I updated the alert on our side (we are using version 0.17 still) to use the histogram as updated in: https://github.com/coreos/prometheus-operator/blob/0bad93292506ace68e344c9a991af6ae76ae1a51/contrib/kube-prometheus/manifests/prometheus-rules.yaml#L752-L759

The strange part is the values come and go.. screen shot 2018-12-18 at 4 24 58 pm

joshbranham avatar Dec 18 '18 21:12 joshbranham

@brancz Is this project still active? We don't seem to get any responses

kevtaylor avatar Dec 20 '18 17:12 kevtaylor

We are observing the same behaviour with our clusters. Does anyone have any news on that? We're finding it quite weird because every certificate seems to be up to date but the metrics show something different. We tried restarting the apiserver docker container on the master node following this comment. The alert stopped for a while but came back hours after.

leoncard avatar Jan 03 '19 17:01 leoncard

So we figured out our issue (at least it is the only thing that makes sense). The alert was firing sporadically, and it was only when our single user that had certificate-based authentication was communicating with the API (Jenkins). Those certs did, in fact, expire in line with what the alert was saying, and since rotated them we have not had the alert fire.

joshbranham avatar Jan 03 '19 17:01 joshbranham

@joshphp our histogram is incrementing sporadically as well, we couldn't tie them to any client machine yet, but we noticed that for the clusters these numbers are increasing they are increasing at the same exact time. May I ask how you traced down that single user?

shovelend avatar Jan 04 '19 11:01 shovelend

I think also that this PR may be related to this issue: prometheus-operator/prometheus-operator#2058

kevtaylor avatar Jan 04 '19 11:01 kevtaylor

@shoveland the old fashioned way: the user stopped being able to talk to the API with certificate warnings 😏

joshbranham avatar Jan 04 '19 13:01 joshbranham

It seems that this issue is about expired client certs after all :slightly_smiling_face: . I'll keep this open for now as prometheus-operator/prometheus-operator#2058 is correct, this alert is about client certs, not serving certs, however, that won't change it from firing, so you will need to check up on your certs and check the apiserver's logs for which clients did these requests with expired certs.

brancz avatar Jan 07 '19 13:01 brancz

@brancz that's correct, let's close this after rephrasing the description of the alert. Having checked the apiserver's logs we still don't see the culprit, there are no messages logged regarding expired certificates nor authentication requests that failed. Do you have any other ideas where we could identify the client?

shovelend avatar Jan 07 '19 14:01 shovelend

I realize this is drastic but if I recall correctly if you bump logging verbosity to --v=10 then you see the user identity of the certificate printed. That's the best I can do, otherwise I'd suggest we should add a log line to Kubernetes to log this more explicitly.

brancz avatar Jan 07 '19 14:01 brancz

Setting the log verbosity to 10 helped us track down the issue.

We found out that the client certificate data (part of the ./kube/config) we generated for developers was expiring after 1 month. When developers were trying to access the kubernetes-dashboard by kubectl proxy-ing from their local machines (with an expired certificate), the metric increased as well. We extended the expiry date of the generated certificates.

This issue was quite difficult to track down and we found the needle in the haystack when setting the apiserver verbosity to 10 (had to go through 100.000 lines of log looking for a possible culprit.) If we were to suggest a place for improvement, it would be the alert message. If the metric could capture the url that was about to be accessed with the expired certificate and/or the client IP address, it would be incredibly helpful and could be displayed within the alert.

shovelend avatar Jan 08 '19 16:01 shovelend

@shovelend could you share a logline that would help me find it in my logs :)

The metric could tell us which certificate is about to expire.

daimoniac avatar Jan 09 '19 18:01 daimoniac

@daimoniac I don't think that's a good idea, that would allow clients to produce arbitrary amounts of metrics leading to a denial of service attack against a kubernetes API. I think we should add an info log line that says which user/certificate caused the counter to increment. Happy to review your Kubernetes PR if you open this :slightly_smiling_face: .

brancz avatar Jan 10 '19 12:01 brancz

I'm running prometheus-operator on EKS and have this issue as well. If I understand correctly (couldn't find anything on the web about this metric apiserver_client_certificate_expiration_seconds_bucket) this is about api clients (such as kubectl) using soon to be expired certs. I don't think it's actually kubectl in this case since in EKS it's using a custom webhook validator. Any pointers to how I can identify the aged certs, given I don't have access to the control plane? If this is the by-design behavior of EKS (see @willtrking 's comment), then should this alert be preconfigured? consider the more general case outside EKS of automatically rotating short lived certs.

itaysk avatar Feb 12 '19 11:02 itaysk

@itaysk I have just taken the system-alerts rules out as this is managed by AWS. I couldn't find a good way to exclude these specific rules, i am more intrigued as to why they are at 0, i suspect the prom library they use does this by default? It probably shouldnt.

willejs avatar Feb 18 '19 10:02 willejs

@willejs - didn't what to remove the entire kubernetes-system rules set (https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/alertmanager/rules/kubernetes-system.yaml) for now i've edited those rules out.

itaysk avatar Feb 18 '19 13:02 itaysk

Setting the log verbosity to 10 helped us track down the issue.

We found out that the client certificate data (part of the ./kube/config) we generated for developers was expiring after 1 month. When developers were trying to access the kubernetes-dashboard by kubectl proxy-ing from their local machines (with an expired certificate), the metric increased as well. We extended the expiry date of the generated certificates.

This issue was quite difficult to track down and we found the needle in the haystack when setting the apiserver verbosity to 10 (had to go through 100.000 lines of log looking for a possible culprit.) If we were to suggest a place for improvement, it would be the alert message. If the metric could capture the url that was about to be accessed with the expired certificate and/or the client IP address, it would be incredibly helpful and could be displayed within the alert.

Can you explain which words you searched?

azalio avatar Feb 20 '19 15:02 azalio

We're seeing the same issue on our clusters and finding it incredibly hard to track down the expiring certificate - can someone please share a log line we can search for when the verbosity is set to 10?

madAndroid avatar Mar 28 '19 08:03 madAndroid