zookeeper-operator icon indicating copy to clipboard operation
zookeeper-operator copied to clipboard

Failure to access a ZooKeeper cluster via IP address with TLS

Open NickLarsenNZ opened this issue 6 months ago • 1 comments

This is actually a problem caused by ZooKeeper client (ie: what is called via zkCli.sh)

Problem

Accessing a cluster with valid SAN entry:

openssl s_client -connect 172.18.0.2:30504 | openssl x509 -noout -text
...
            X509v3 Subject Alternative Name: critical
                IP Address:172.18.0.2

using the Zookeeper Client:

/stackable/zookeeper/bin/zkCli.sh -server 172.18.0.2:30504 ls /

results in a connection failure:

Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching 172-18-0-2.kubernetes.default.svc.cluster.local found.

Steps to reproduce

Deploy a ZookeeperCluster with a listenerClass of external-unstable on a KinD cluster and with TLS enabled:

apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: test-zk
spec:
  clusterConfig:
    authentication:
    - authenticationClass: zk-client-auth-tls
    tls:
      quorumSecretClass: tls
      serverSecretClass: zk-client-secret
  image:
    productVersion: 3.9.3
  servers:
    config:
      resources:
        cpu:
          max: 500m
          min: 250m
        memory:
          limit: 512Mi
        storage:
          data:
            capacity: 1Gi
    roleConfig:
      # 👇 see here
      listenerClass: external-unstable
    roleGroups:
      primary:
        replicas: 3

[!NOTE] todo: add complete minimal example.

FWIW, I launched this with:

scripts/run-tests --test smoke_zookeeper-3.9.3_use-server-tls-true_use-client-auth-tls-true_openshift-false --parallel 1 --skip-delete

And then manually updated the listenerClass on the ZookeeperCluster.

Get the node hostname (in this case, IP) and node port:

kubectl -n kuttl-test-musical-stork get listener test-zk-server -o 'jsonpath={.status.ingressAddresses[0].address}:{.status.nodePorts.zk}'

Shell into the first replica, and run:

export CLIENT_STORE_SECRET="$(< /stackable/rwconfig/zoo.cfg grep "ssl.keyStore.password" | cut -d "=" -f2)"
export CLIENT_JVMFLAGS="
-Dzookeeper.authProvider.x509=org.apache.zookeeper.server.auth.X509AuthenticationProvider
-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
-Dzookeeper.client.secure=true
-Dzookeeper.ssl.keyStore.location=/stackable/server_tls/keystore.p12
-Dzookeeper.ssl.keyStore.password=${CLIENT_STORE_SECRET}
-Dzookeeper.ssl.trustStore.location=/stackable/server_tls/truststore.p12
-Dzookeeper.ssl.trustStore.password=${CLIENT_STORE_SECRET}"

and then

# replace the IP and port with what was returned in the earlier kubectl command
/stackable/zookeeper/bin/zkCli.sh -server 172.18.0.2:30504 ls /

The client will fail to connect due to an invalid name.

Explanation

The ZooKeeper client is doing a reverse DNS lookup on the IP provided in the command line, and then using that to connect to ZooKeeper. But the reverse DNS record is not in the SAN entries (this is expected).

@nightkr: in this case it seems to come up because it's running on the same control plane node as the apiserver, but pretty sure any hostNetworking pod that uses a service in the same way would trigger the same bug.

Considerations:

  • Document that ZK cannot be exposed when the Listener reports back an IP address instead of hostname.
  • Fix the ZK Client upstream.
  • ~Add Reverse DNS entries to TLS certificate SANs.~ The reverse DNS record is not a reliable identifier to base trust on.

NickLarsenNZ avatar Jul 03 '25 11:07 NickLarsenNZ

To note, this behavior exists before and after https://github.com/stackabletech/zookeeper-operator/pull/957

NickLarsenNZ avatar Jul 03 '25 11:07 NickLarsenNZ