consul-k8s
                                
                                 consul-k8s copied to clipboard
                                
                                    consul-k8s copied to clipboard
                            
                            
                            
                        TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
We are running Kafka and Zookeeper inside the Consul service mesh and sometimes the connection from Kafka to Zookeeper seems to fail. This issue seems to be specific to stateful sets as I also noticed a similar issue when connection to Redis. Also another strange behavior maybe related to this is that the headless service has to be used as the host instead of the normal service. Using the normal service works sometimes but fails more often then the headless service.
Kafka is dialing Zookeeper directly through the headless service so I have configured ServiceDefaults to allow direct connections.
Another maybe interesting fact is that in the Consul UI Topology view, Zookeeper is not shown as an upstream for Kafka. This always seems to be the case if the connection also does not work so it could potentially be related.
Configuration
Consul helm values
global:
  domain: consul
  name: consul
  gossipEncryption:
    autoGenerate: true
  tls:
    enabled: true
    enableAutoEncrypt: true
  acls:
    manageSystemACLs: true
  
server:
  replicas: 3
  bootstrapExpect: 3
  extraConfig: |
    {
      "ui_config": {
        "content_path": "/consul"
      }
    }
ui:
  enabled: true
  service:
    type: "ClusterIP"
client:
  tolerations: |
    - operator: Exists
  extraConfig: |
    {"advertise_reconnect_timeout": "15m"}
connectInject:
  enabled: true
  transparentProxy:
    defaultEnabled: true
controller:
  enabled: true
Kafka config related to Consul
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: kafka
  labels:
    app: kafka
spec:
  protocol: tcp
  transparentProxy:
    dialedDirectly: true
---
podAnnotations:
  consul.hashicorp.com/connect-inject: "true"
  consul.hashicorp.com/connect-service: "kafka"
  consul.hashicorp.com/connect-service-port: "9092"
  consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "5556,9093"
  consul.hashicorp.com/transparent-proxy-exclude-outbound-ports: "9093"
  consul.hashicorp.com/service-tags: "kafka"
Zookeeper config related to Consul
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: zookeeper
  labels:
    app: zookeeper
spec:
  protocol: tcp
  transparentProxy:
    dialedDirectly: true
---
podAnnotations:
  consul.hashicorp.com/connect-inject: "true"
  consul.hashicorp.com/connect-service: "zookeeper"
  consul.hashicorp.com/connect-service-port: "2181"
  consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "2888,3888,9141"
  consul.hashicorp.com/transparent-proxy-exclude-outbound-ports: "2888,3888"
  consul.hashicorp.com/service-tags: "zookeeper"
Logs
Zookeeper Envoy logs
[2021-12-14 06:53:08.770][13][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:249] [C787] new tcp proxy session [2021-12-14 06:53:08.770][13][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:397] [C787] Creating connection to cluster local_app [2021-12-14 06:53:08.770][13][debug][pool] [source/common/tcp/original_conn_pool.cc:98] creating a new connection [2021-12-14 06:53:08.770][13][debug][pool] [source/common/tcp/original_conn_pool.cc:383] [C788] connecting [2021-12-14 06:53:08.770][13][debug][connection] [source/common/network/connection_impl.cc:860] [C788] connecting to 127.0.0.1:2181 [2021-12-14 06:53:08.770][13][debug][connection] [source/common/network/connection_impl.cc:876] [C788] connection in progress [2021-12-14 06:53:08.770][13][debug][pool] [source/common/tcp/original_conn_pool.cc:125] queueing request due to no available connections [2021-12-14 06:53:08.770][13][debug][conn_handler] [source/server/active_tcp_listener.cc:328] [C787] new connection [2021-12-14 06:53:08.770][13][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:219] [C787] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER [2021-12-14 06:53:08.770][13][debug][connection] [source/common/network/connection_impl.cc:243] [C787] closing socket: 0 [2021-12-14 06:53:08.770][13][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:219] [C787] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER [2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:223] canceling pending request [2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:231] canceling pending connection [2021-12-14 06:53:08.771][13][debug][connection] [source/common/network/connection_impl.cc:133] [C788] closing data_to_write=0 type=1 [2021-12-14 06:53:08.771][13][debug][connection] [source/common/network/connection_impl.cc:243] [C788] closing socket: 1 [2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:140] [C788] client disconnected [2021-12-14 06:53:08.771][13][debug][conn_handler] [source/server/active_tcp_listener.cc:76] [C787] adding to cleanup list [2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:255] [C788] connection destroyed
Zookeeper logs
2021-12-14 07:10:30,079 [myid:1] - INFO [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55294 2021-12-14 07:10:39,204 [myid:1] - INFO [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55404 2021-12-14 07:10:40,076 [myid:1] - INFO [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55420 2021-12-14 07:10:49,196 [myid:1] - INFO [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55546 2021-12-14 07:10:50,083 [myid:1] - INFO [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55564 2021-12-14 07:10:59,191 [myid:1] - INFO [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55686 2021-12-14 07:11:00,136 [myid:1] - INFO [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55706 2021-12-14 07:11:09,189 [myid:1] - INFO [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55828 2021-12-14 07:11:10,085 [myid:1] - INFO [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55844 2021-12-14 07:11:19,203 [myid:1] - INFO [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55960 2021-12-14 07:11:20,083 [myid:1] - INFO [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55978
Kafka Envoy logs
[2021-12-14 07:36:48.106][18][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:249] [C1003] new tcp proxy session [2021-12-14 07:36:48.106][18][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:397] [C1003] Creating connection to cluster local_app [2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:98] creating a new connection [2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:383] [C1004] connecting [2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:860] [C1004] connecting to 127.0.0.1:9092 [2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:876] [C1004] connection in progress [2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:125] queueing request due to no available connections [2021-12-14 07:36:48.106][18][debug][conn_handler] [source/server/active_tcp_listener.cc:328] [C1003] new connection [2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:243] [C1003] closing socket: 0 [2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:223] canceling pending request [2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:231] canceling pending connection [2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:133] [C1004] closing data_to_write=0 type=1 [2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:243] [C1004] closing socket: 1 [2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:140] [C1004] client disconnected [2021-12-14 07:36:48.106][18][debug][conn_handler] [source/server/active_tcp_listener.cc:76] [C1003] adding to cleanup list [2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:255] [C1004] connection destroyed [2021-12-14 07:36:52.253][1][debug][main] [source/server/server.cc:209] flushing stats [2021-12-14 07:36:57.254][1][debug][main] [source/server/server.cc:209] flushing stats [2021-12-14 07:36:58.107][16][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:249] [C1005] new tcp proxy session [2021-12-14 07:36:58.107][16][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:397] [C1005] Creating connection to cluster local_app [2021-12-14 07:36:58.107][16][debug][pool] [source/common/tcp/original_conn_pool.cc:98] creating a new connection [2021-12-14 07:36:58.107][16][debug][pool] [source/common/tcp/original_conn_pool.cc:383] [C1006] connecting [2021-12-14 07:36:58.107][16][debug][connection] [source/common/network/connection_impl.cc:860] [C1006] connecting to 127.0.0.1:9092 [2021-12-14 07:36:58.107][16][debug][connection] [source/common/network/connection_impl.cc:876] [C1006] connection in progress [2021-12-14 07:36:58.107][16][debug][pool] [source/common/tcp/original_conn_pool.cc:125] queueing request due to no available connections [2021-12-14 07:36:58.107][16][debug][conn_handler] [source/server/active_tcp_listener.cc:328] [C1005] new connection [2021-12-14 07:36:58.107][16][debug][connection] [source/common/network/connection_impl.cc:243] [C1005] closing socket: 0 [2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:223] canceling pending request [2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:231] canceling pending connection [2021-12-14 07:36:58.108][16][debug][connection] [source/common/network/connection_impl.cc:133] [C1006] closing data_to_write=0 type=1 [2021-12-14 07:36:58.108][16][debug][connection] [source/common/network/connection_impl.cc:243] [C1006] closing socket: 1 [2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:140] [C1006] client disconnected [2021-12-14 07:36:58.108][16][debug][conn_handler] [source/server/active_tcp_listener.cc:76] [C1005] adding to cleanup list [2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:255] [C1006] connection destroyed
Kafka logs
[2021-12-14 07:08:59,224] INFO Initiating client connection, connectString=zookeeper-headless:2181 sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@2d1dee39 (org.apache.zookeeper.ZooKeeper)
[2021-12-14 07:08:59,228] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2021-12-14 07:08:59,233] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:08:59,235] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-12-14 07:08:59,250] INFO Opening socket connection to server zookeeper-headless/172.23.4.112:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:08:59,255] INFO Socket error occurred: zookeeper-headless/172.23.4.112:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:00,363] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:00,364] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:01,464] INFO Opening socket connection to server zookeeper-headless/172.23.4.112:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:01,465] INFO Socket error occurred: zookeeper-headless/172.23.4.112:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:02,566] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:02,567] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:03,668] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:03,669] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:04,769] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:04,770] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:05,236] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-12-14 07:09:05,871] INFO Opening socket connection to server zookeeper-headless/172.23.4.112:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:05,976] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2021-12-14 07:09:05,977] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:05,978] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-12-14 07:09:05,981] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
        at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:271)
        at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:267)
        at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:125)
        at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1948)
        at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:431)
        at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:456)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:191)
        at kafka.Kafka$.main(Kafka.scala:109)
        at kafka.Kafka.main(Kafka.scala)
[2021-12-14 07:09:05,983] INFO shutting down (kafka.server.KafkaServer)
[2021-12-14 07:09:05,991] INFO App info kafka.server for 0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
[2021-12-14 07:09:05,992] INFO shut down completed (kafka.server.KafkaServer)
[2021-12-14 07:09:05,992] ERROR Exiting Kafka. (kafka.Kafka$)
[2021-12-14 07:09:05,998] INFO shutting down (kafka.server.KafkaServer)
 
Environment details
- consul-k8s-control-plane v0.37.0
- consul-helm  v0.37.0
- consul v1.10.4
- envoy v1.18.4
- kubernetes v1.20.7
Related
- https://github.com/istio/istio/issues/35870
- https://github.com/istio/istio/issues/22482
Hi @nflaig this seems to be more involved than it seems to deploy Kafka on top of a Service Mesh. Have you seen this pattern deployed successfully elsewhere?
Hi @david-yu, I've seen a similar issue when scraping services with Prometheus within the service mesh. The similarity here is that in both cases the services are dialed directly so maybe the issue is related to that.
Have you seen this pattern deployed successfully elsewhere?
I would expect that to be a common thing to be honest but I think it is not really about kafka it seems to be a general issue with dialed directly and stateful sets.
Related issue with Kafka on Consul K8s: https://github.com/hashicorp/consul/issues/14125 also it is recommended to set MaxInboundConnections to a higher number than defaults which should be enabled by Consul 1.13.2 and https://github.com/hashicorp/consul-k8s/pull/1437 when it is released.
I think I'm running into the same issue with services deployed by Nomad. I can't get a simple tcp echo server to work.
https://github.com/envoyproxy/envoy/issues/23155