etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Bad Certificates should be explained, not just stated.

Open illeatmyhat opened this issue 3 years ago • 1 comments

What happened?

I previously set up a 3-node cluster using bi-directional TLS, which was working before Clients attempting to call etcd using a previously working certificate now fails with:

{"level":"warn","ts":"2022-08-09T10:28:48.597Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"172.30.214.50:39334","server-name":"etcd-0.etcd","error":"remote error: tls: bad certificate"}

What did you expect to happen?

I expected the logs to explain why the certificate was bad instead of stonewalling me

How can we reproduce it (as minimally and precisely as possible)?

Create an etcd cluster and connect to it using a bad certificate. Any flavor you want. Pretend to be a clueless user who knows nothing about certificates and wonder what's wrong with your certificate

Anything else we need to know?

The client and etcd use the same CA Issuer, hence the etcd configuration below.

Etcd Server's working peer certificate
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            ...
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=wmlserving-ca
        Validity
            Not Before: Jun 20 20:00:46 2022 GMT
            Not After : Sep 18 20:00:46 2022 GMT
        Subject: CN=*.etcd
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (4096 bit)
                Modulus:
                    ...
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier: 
                keyid:...

            X509v3 Subject Alternative Name: 
                DNS:localhost, DNS:etcd, DNS:*.etcd, DNS:etcd.argo-wo, DNS:*.etcd.argo-wo, DNS:etcd.argo-wo.svc, DNS:*.etcd.argo-wo.svc, DNS:etcd.argo-wo.svc.cluster.local, DNS:*.etcd.argo-wo.svc.cluster.local
    Signature Algorithm: sha256WithRSAEncryption
         ...

(Decoded using the following version of OpenSSL: OpenSSL 1.1.1b  26 Feb 2019)
Client's "bad certificate" information
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
           ...
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=wmlserving-ca
        Validity
            Not Before: Jun 28 18:58:16 2022 GMT
            Not After : Sep 26 18:58:16 2022 GMT
        Subject: CN=wml-serving
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (4096 bit)
                Modulus:
                    ...
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier: 
                keyid:...

            X509v3 Subject Alternative Name: 
                DNS:localhost, DNS:wml-serving, DNS:wml-serving.argo-wo, DNS:wml-serving.argo-wo.svc, DNS:wml-serving.argo-wo.svc.cluster.local
    Signature Algorithm: sha256WithRSAEncryption
         ...


(Decoded using the following version of OpenSSL: OpenSSL 1.1.1b  26 Feb 2019)

Etcd version (please run commands below)

quay.io/coreos/etcd:v3.5.4

$ etcd --version
etcd Version: 3.5.4
Git SHA: 08407ff76
Go Version: go1.16.15
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.5.4
API version: 3.5

Etcd configuration (command line flags or environment variables)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: etcd
spec:
  replicas: 3
  selector:
    matchLabels:
      name: etcd
  serviceName: etcd
  template:
    metadata:
      labels:
        name: etcd
    spec:
      containers:
      - name: app
        image: quay.io/coreos/etcd:v3.5.4
        imagePullPolicy: Always
        volumeMounts:
        - name: data
          mountPath: /var/run/etcd
        - name: etcd-ssl
          mountPath: /etc/etcd/ssl
        command:
        - /bin/sh
        - -c
        - |
          PEERS="etcd-0=https://etcd-0.etcd:2380,etcd-1=https://etcd-1.etcd:2380,etcd-2=https://etcd-2.etcd:2380"
          exec etcd --name ${HOSTNAME} \
            --listen-peer-urls https://0.0.0.0:2380 \
            --listen-client-urls https://0.0.0.0:2379 \
            --advertise-client-urls https://${HOSTNAME}.etcd:2379 \
            --initial-advertise-peer-urls https://${HOSTNAME}:2380 \
            --initial-cluster-token etcd-cluster \
            --initial-cluster ${PEERS} \
            --initial-cluster-state new \
            --trusted-ca-file=/etc/etcd/ssl/ca.crt \
            --cert-file=/etc/etcd/ssl/tls.crt \
            --key-file=/etc/etcd/ssl/tls.key \
            --peer-cert-file=/etc/etcd/ssl/tls.crt \
            --peer-key-file=/etc/etcd/ssl/tls.key \
            --peer-trusted-ca-file=/etc/etcd/ssl/ca.crt \
            --peer-client-cert-auth \
            --client-cert-auth \
            --data-dir /var/run/etcd/default.etcd
      volumes:
      - name: etcd-ssl
        secret:
          secretName: etcd-cert
          items:
          - key: tls.crt
            path: tls.crt
          - key: tls.key
            path: tls.key
          - key: ca.crt
            path: ca.crt
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      storageClassName: {{ .Values.etcdStorageClass }}
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
{"level":"warn","ts":"2022-08-09T10:36:11.729Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000366a80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded

$ export PEERS=https://etcd-0.etcd:2380,https://etcd-1.etcd:2380,https://etcd-2.etcd:2380
$ etcdctl --endpoints=$PEERS endpoint status -w table
{"level":"warn","ts":"2022-08-09T10:39:32.744Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001a4000/etcd-0.etcd:2380","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\""}
Failed to get the status of endpoint https://etcd-0.etcd:2380 (context deadline exceeded)
{"level":"warn","ts":"2022-08-09T10:39:37.745Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001a4000/etcd-0.etcd:2380","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\""}
Failed to get the status of endpoint https://etcd-1.etcd:2380 (context deadline exceeded)
{"level":"warn","ts":"2022-08-09T10:39:42.746Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001a4000/etcd-0.etcd:2380","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\""}
Failed to get the status of endpoint https://etcd-2.etcd:2380 (context deadline exceeded)
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

No response

illeatmyhat avatar Aug 09 '22 10:08 illeatmyhat

It turns out that the CA certificate was expired, although that isn't relevant to the issue, which is that the logs should be explaining this.

illeatmyhat avatar Aug 09 '22 11:08 illeatmyhat

WIP PR in progress: https://github.com/etcd-io/etcd/pull/14617

EmilyM1 avatar Oct 24 '22 18:10 EmilyM1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 19 '23 09:03 stale[bot]