telepresence icon indicating copy to clipboard operation
telepresence copied to clipboard

[v2] Support Kakfa DNS `pod_name.serivce_name.namespace.svc`

Open peteroneilljr opened this issue 3 years ago • 15 comments

Requesting support for Kafka DNS naming structure kafka-kafka-0.kafka-kafka-brokers.appnamespace.svc

After starting an intercept Telepresence is unable to resolve DNS names that end in .svc this is needed to fully support development workflows that involve Kafka services.

Happy to repro this or create a demo if needed.

Thanks, Peter

peteroneilljr avatar May 11 '21 16:05 peteroneilljr

I would like to support this one and give more background.

Kafka is a sort of beast you don't want to run on your laptop, by that it's the perfect use-case for telepresence but not well supported yet.

With version 2.2.2. after the connection is established, you can connect to it via: main-cluster-kafka-bootstrap.kafka exactly how described in the docu...

the problem is, that the call to a bootstrap service is just a first step. It returns a list of brokers you can connect to. This list of brokers is also addressed by DNS names that are perfectly working inside of the cluster, these must be something like:

main-cluster-kafka-0.main-cluster-kafka-brokers.kafka.svc
 main-cluster-kafka-1.main-cluster-kafka-brokers.kafka.svc

or maybe (not sure)

main-cluster-kafka-0.main-cluster-kafka-brokers.kafka.svc.cluster.local
 main-cluster-kafka-1.main-cluster-kafka-brokers.kafka.svc..cluster.local

and those are not resolved into the cluster via default telepresence config. Maybe some workaround exists here? maybe it's just a hint to telepresence configuration?

aholbreich avatar May 27 '21 07:05 aholbreich

@peteroneilljr @aholbreich Starting with 2.3.0, the FQDNs are resolved in the cluster so in a perfect world, this ticket has been resolved now. Can you please try with 2.3.1 and check if the resolution works OK now?

thallgren avatar Jun 15 '21 10:06 thallgren

@thallgren I need some time for it. Cannot promise to retest it soon because we are working with a workaround now. But I'll try to find time today or tomorrow.

aholbreich avatar Jun 15 '21 10:06 aholbreich

Hello, I have been tracking this problem for while. I can confirm that Telepresence version 2.3.7 resolves the names given out by the Strimzi managed bootstrap server, and the IP is correct. The IP it is resolving though is not routeable from outside the cluster for me.

OS: Ubuntu 20.10 Systemd: 246 Kubernetes: Kind[v0.11.1], Cluster: v1.21.1 Docker: 20.10.7, build f0df350

ping kafka-kafka-0.kafka-kafka-brokers.myapp.svc PING kafka-kafka-0.kafka-kafka-brokers.myapp.svc (172.17.0.16) 56(84) bytes of data. ^C --- kafka-kafka-0.kafka-kafka-brokers.myapp.svc ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1015ms

The IP above is my main docker bridge subnet - 172.17.0.0/24 The Cluster IPs within Kind are all in 10.244.0.0/24. The docker bridge that the wrapping Kind container is using is 172.19.0.2. Kubenetes daemonset pods bind to that IP.

I actually can't see any references to 172.17.0.0/24 inside kubernetes anywhere. It does resolve properly within the cluster though.

daviddawson avatar Jul 24 '21 17:07 daviddawson

Not sure what you mean with "not routable" in this case. While you can use ping to test the DNS resolution, you shouldn't expect it to return any data because it uses ICMP. Telepresence will only support TCP and UDP.

thallgren avatar Jul 25 '21 06:07 thallgren

Sorry, yes, I mean that the name->IP can resolve properly via DNS (I was using ping above just for a shorthand, dig gives the same IP results), but no traffic can route to the IP given out. ping isn't relevant.

TCP traffic doesn't route though to 172.17.0.16, so curl blocks waiting for the socket

daviddawson avatar Jul 26 '21 13:07 daviddawson

I was playing with this today with Telepresence 2.4.0 and my result is a bit similar:

  • The short address in the format <pod-name>.<headless-service-name>.<namespace> seems to resolve. But <pod-name>.<headless-service-name>.<namespace>.svc does not seem to resolve (i.e. with the .svc suffix). It would be great to have also the .svc addresses resolve - but it is not necessarily a blocker.
  • Even through it resolves, I cannot connect to the address. I guess this might maybe have something to do with the headless service resolving to pod IP address instead of ClusterIP of a regular service which seems to work fine.

scholzj avatar Aug 23 '21 21:08 scholzj

I just revisited this, and its possible to configure strimzi to work with this. It's quite manual and clunky, but does work

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka
  namespace: infra
spec:
  kafka:
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
        configuration:
          brokers:
            - broker: 0
              advertisedHost: kafka-kafka-0.kafka-kafka-brokers.infra

The kafka bootstrap then serves this address out as the broker, which will work both in the cluster and via telepresence.

daviddawson avatar Sep 09 '21 21:09 daviddawson

@daviddawson Which version of Telepresence did you used? I did this as well => that worked around the first point I raised. But I run into the second issue afterwards - that it resolved the address, but didn't route any traffic.

scholzj avatar Sep 09 '21 21:09 scholzj

@scholzj

telepresence version
Client: v2.4.2 (api v3)
Root Daemon: v2.4.2 (api v3)
User Daemon: v2.4.2 (api v3)

I used kafkajs to test, connecting to the bootstrap server at 'kafka-kafka-bootstrap.infra:9092'

daviddawson avatar Sep 10 '21 08:09 daviddawson

As @scholzj points out here, since version 0.20.0 of Strimzi, you can actually configure your Kafka objects with a useServiceDnsDomain property, which defines whether or not to add the .cluster.local suffix at the end of the brokers' advertised addresses. For example, the following configuration

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-cluster
spec:
  kafka:
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
        configuration:
          useServiceDnsDomain: true

will return an address kafka-cluster-kafka-0.kafka-cluster-kafka-brokers.default.svc.cluster.local when asking the bootstrap service about the list of available brokers, which can of course be resolved by your Telepresence connection. This has a similar effect to what @daviddawson does, but without having to manually specify addresses for each broker.

Nevertheless, I still believe that support for addresses of the form pod_name.serivce_name.namespace.svc is needed.

amolerca avatar Oct 09 '21 13:10 amolerca

@amolerca , does a name like that resolve correctly if you add ".svc" to the list of include-suffixes?

thallgren avatar Oct 11 '21 04:10 thallgren

@thallgren Why shouldn't it work? Both .svc.cluster.local and .svc are valid suffixes at Kubernetes; the configuration I mention simply forces Strimzi to use the first one in order to advertise Kafka brokers, which is the one that Telepresence understands by default. I'd be surprised if adding suffixes to the include-suffixes list overrode this behavior.

amolerca avatar Oct 11 '21 15:10 amolerca

@amolerca I'm not saying that it shouldn't work. I'm just trying to figure out why it doesn't. My theory is that Telepresence's DNS resolver (the one running on your workstation) doesn't recognize svc as a valid domain name so a name ending with it will never be dispatched to the cluster. Adding it to the include-suffixes would force it to be recognized and if that works, then a) we know where the real problem is and, b) you have a workaround until a fix is delivered.

thallgren avatar Oct 11 '21 20:10 thallgren

I just revisited this, and its possible to configure strimzi to work with this. It's quite manual and clunky, but does work

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka
  namespace: infra
spec:
  kafka:
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
        configuration:
          brokers:
            - broker: 0
              advertisedHost: kafka-kafka-0.kafka-kafka-brokers.infra

The kafka bootstrap then serves this address out as the broker, which will work both in the cluster and via telepresence.

I had to use a nodeport to get this to work on kind. I used

      - name: external
        port: 9094
        type: nodeport
        tls: false
        configuration:
          bootstrap:
            nodePort: 30000
          brokers:
          - broker: 0
            nodePort: 30001
            advertisedPort: 9094
            advertisedHost: streaming-system-kafka-0.kafka.svc.cluster.local

PiePra avatar Sep 18 '22 19:09 PiePra

based on a comment from a telepresence contributor on another issue I unfortunately cannot find, it seems that telepresence v2 does not support proxying pods, hence the hostname of the broker being resolved to an IP address that is not reachable from outside the cluster. If I'm understanding the issue/limitation correctly, this means telepresence v2 does not work when headless services are in use, which seems like a significant gap in functionality. configuring kafka to accept connections from outside the cluster is a problematic "solution" IMO when you want to use telepresence to pretend to be inside your real world cluster.

revero-doug avatar Oct 13 '22 18:10 revero-doug

Your assumption is not correct. Telepresence works fine with headless services.

thallgren avatar Oct 14 '22 09:10 thallgren

Intercepting a pod that consumes/produces using message queues is a challenge, given that Telepresence doesn't stop the container that it intercepts. It continues to consume/produce events in parallel with the interceptor running on the local workstation. We do provide some functionaltiy to help with this, but it will only solve use-cases where headers can be forwarded to the messages, and the consumer can filter using such headers. See Telepresence RESTful API for more info.

thallgren avatar Oct 14 '22 10:10 thallgren

@thallgren

first, regarding your most recent comment, that's a worthwhile caveat to call out, but irrelevant in this case. FWIW I don't even care about intercepting incoming traffic to the service I'm intercepting, I really just want to be able to communicate with other services at their in-cluster domains from my local environment. I'm intercepting what's effectively a no-op deployment that's irrelevant to the rest of my system other than being able to communicate with its parts as a tcp client using in-cluster domains.

Now on to addressing this open kafka dns issue:

DNS successfully finds the pod IP of the single broker running in my kafka cluster (using the kafka-ephemeral-single strimzi example with useServiceDnsDomain: true set as described above), verified using the dns module in my simple test process based on the example in KafkaJS's README, but attempts to connect to that broker time out. When I create a headful service to proxy requests to that pod and set advertisedHost to that service's fqdn, it works as expected. I'm running default minikube config on a darwin macbook pro. AFAICT the only input variable that changes between the failing and passing setups is headless/headful service; IP-wise, the services and pods are on different subnets, another differentiating factor between headless and headful services in this setup.

tl;dr: Telepresence + Strimzi Kafka Operator + Basic/Default Kafka manifest + darwin MBP + default minikube cluster = does not work without workarounds in addition to useServiceDnsDomain: true

revero-doug avatar Oct 14 '22 16:10 revero-doug

@revero-doug FYI: I guess the https://github.com/strimzi/strimzi-kafka-operator/pull/7365 - once merged - should make this easier. The main motivation behind it is different. But it will basically let you create an internal listener which will use ClusterIP services instead of routing through the pod DNS names. So that might allow you to workaround the Telepresence limitations here.

scholzj avatar Oct 14 '22 17:10 scholzj

@revero-doug would it be possible for you to rephrase the problem and turn it into a feature request? That way, we'd be able to assess the effort needed to come up with a solution.

thallgren avatar Oct 14 '22 17:10 thallgren

@thallgren if you're asserting that proxying local traffic to cluster pods fronted by headless services is an already supported feature, this would still be a bug, not a feature request. Arguably the bug could be split off of this ticket, but the scope of this ticket could just as easily be broadened to "Support proxying traffic to internal kafka brokers", and the existing set of comments would be appropriate; I think at least one other commenter on this issue has described the same scenario:

I was playing with this today with Telepresence 2.4.0 and my result is a bit similar:

  • The short address in the format <pod-name>.<headless-service-name>.<namespace> seems to resolve. But <pod-name>.<headless-service-name>.<namespace>.svc does not seem to resolve (i.e. with the .svc suffix). It would be great to have also the .svc addresses resolve - but it is not necessarily a blocker.
  • Even through it resolves, I cannot connect to the address. I guess this might maybe have something to do with the headless service resolving to pod IP address instead of ClusterIP of a regular service which seems to work fine.

revero-doug avatar Oct 14 '22 17:10 revero-doug

@thallgren new issue opened in #2814

revero-doug avatar Oct 14 '22 17:10 revero-doug

I could not make strimzi/kafka or bitnami/kafka helm work with telepresence for various reasons.

Eventually we started are using kafka without zookeper and this simple k8s deploy works fine with telepresence.

I hope it helps somebody!

confiq avatar Nov 16 '22 15:11 confiq

It seems the original question of support for Kafka DNS would be to add the .svc in the include suffixes, so we consider that resolved.

cindymullins-dw avatar May 02 '23 01:05 cindymullins-dw