linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

IPv6 Support

Open grampelberg opened this issue 5 years ago • 31 comments

What problem are you trying to solve?

Now that Kubernetes supports IPv6, Linkerd2 should as well.

grampelberg avatar Dec 18 '19 23:12 grampelberg

This is becoming a serious problem for us, now that we started deploying EKS clusters with IPv6 support...

rkujawa avatar Feb 24 '22 13:02 rkujawa

This is becoming a serious problem for us

Can you elaborate on the problems you're encountering? What parts of Linkerd don't work with IPv6?

olix0r avatar Feb 24 '22 16:02 olix0r

Hi, according to @rkujawa message I attache below some details about our IPv6 issue with linkerd:

While linkerd starting on IPv6 k8s cluster linkerp-proxy have problem with connection to linkerd-identity (for linkerd-destination and linkerd-proxy-injector the same issue):

Readiness probe failed: Get "http://[2a05:d011:9fd:ae44:3db5::c]:4191/ready": dial tcp [2a05:d011:9fd:ae44:3db5::c]:4191: connect: connection refused

Liveness probe failed: Get "http://[2a05:d011:9fd:ae44:3db5::c]:4191/live": dial tcp [2a05:d011:9fd:ae44:3db5::c]:4191: connect: connection refused

Tested on linkerd version stable-2.11.1 and edge-22.2.4.

pmacieje avatar Feb 25 '22 09:02 pmacieje

I don't currently have bandwidth to dig into this deeply, but I'll leave some notes here in case someone else has time to investigate:

  • There's nothing (as far as I know) in Linkerd that is inherently IPv4-specific. All of the discovery APIs, etc already handle IPv6 addresses.
  • A quick browse of the code highlights that there are a few places in the injector's proxy template where IPv4 addresses (127.0.0.1 and 0.0.0.0) are used to configure the proxy's listeners: https://github.com/linkerd/linkerd2/blob/86f56df4bcfebb35fd5313bcf5d56e6e299946da/charts/partials/templates/_proxy.tpl#L52-L59 I'd probably start by manually modifying the generated YAML to use IPv6 addresses instead to see if that produces a working control plane. We'll probably need a way to make these addresses configurable.
  • We'll need to get an integration test setup that configures a k3d (or kind) cluster with IPv6.

olix0r avatar Feb 25 '22 16:02 olix0r

Hi, any progress in this issue ? Support for IPv6 is realy needed for us.

pmacieje avatar May 13 '22 12:05 pmacieje

@pmacieje I'd look at the comment posted above. As mentioned, the controller supports IPv6 addresses. You can try to test everything by changing the manifests on install to configure proxy listeners with IPv6 counterparts. If there are any issues, please let us know through an issue so we can attempt to fix them.

mateiidavid avatar May 13 '22 14:05 mateiidavid

While it's not clear to me whether k3d supports IPv6 yet, it appears that k3s does. So I suspect it's now possible for us to set up IPv6-only clusters for our integration tests.

While looking at some unrelated changes, I noticed that we are probably going to have to update the proxy's DNS resolution to support IPv6:

If anyone wants to help out with this effort, I think the next step is figuring out how to boot up a k3d cluster with an IPv6 pod network so that we can start setting up a reproducible integration test.

olix0r avatar May 16 '22 17:05 olix0r

I tried this out and ran linkerd check but it failed.

linkerd-init init container seems to not set up iptables rules for ipv6 properly. It only runs iptables but not ip6tables

also linkerd-proxy prints the following WARNings:

[   149.677120s]  WARN ThreadId(01) policy:watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   149.749911s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=[2a05:f480:1800:27d:63dd:4e0:e8e:d40a]:8080}: linkerd_reconnect: Failed to connect error=received corrupt message

and:

  Normal   Killing              34s (x2 over 3m6s)    kubelet            FailedPostStartHook
  Warning  FailedPostStartHook  34s (x2 over 3m6s)    kubelet            Exec lifecycle hook ([/usr/lib/linkerd/linkerd-await --timeout=2m]) for Container "linkerd-proxy" in Pod "linkerd-destination-5f8f4db6-plj75_linkerd(4e52504b-2198-41f0-b324-2b303e963b70)" failed - error: command '/usr/lib/linkerd/linkerd-await --timeout=2m' exited with 69: linkerd-proxy failed to become ready within 120s timeout

I suppose this is the issue @olix0r is referring to

So something outside of the helm template is still trying to do something with 127.0.0.1 and also the iptables rules are definitely an issue

arianvp avatar May 19 '22 14:05 arianvp

Thanks for trying that, @arianvp. I suspect the 127.0.0.1 is coming from the deployment manifest. You could probably try manually editing it to replace instances of 127.0.0.1 and 0.0.0.0. I'm not sure how much farther that will get you, though.

olix0r avatar May 19 '22 14:05 olix0r

I'll see if I can convince k3d to set up an ipv6-only cluster this weekend and open up a PR

arianvp avatar May 19 '22 16:05 arianvp

I was curious to try this too as we'd like to run linkerd on IPv6 EKS clusters. I've tried with an IPv6 kind cluster (super easy to setup). Didn't get too far yet but thought I'd share the following:

linkerd-init init container seems to not set up iptables rules for ipv6 properly. It only runs iptables but not ip6tables

This is correct. I've tried to make a custom-built proxy-init image that uses ip6tables instead (requires /lib/modules to be mounted as hostPath).

Despite the init completing successfully, linkerd-destination and linkerd-proxy-injector are stuck in PodInitializing (due to identity not running? Can't confirm, no logs in proxies).

Linkerd-identity is not in PodInitializing. The proxy container actually starts and is Running, but is reporting SRV resolution errors. The identity controller container is in a CrashLoopBackoff with only the following lines logged:

time="2022-06-05T23:21:21Z" level=info msg="starting admin server on :9990"                                                                                                                                                                                                              
time="2022-06-05T23:21:21Z" level=info msg="starting gRPC server on :8080"                                                                                                                                                                                                               
time="2022-06-05T23:21:51Z" level=info msg="shutting down gRPC server on :8080"      

Perhaps this helps a little :) I might have a further look at some point. I'd welcome some tips on how to get some more data about what's going wrong (log levels, tracing etc) or just what to try next to get some momentum on this issue.

valorl avatar Jun 05 '22 23:06 valorl

Ok, I got some more logs by removing the postStart hook from the proxy. Seems like there are some places in the proxy with IPv4 (127.0.0.1) still assumed. Not sure if this is configurable, but I have no mention of 127.0.0.1 in my manifests at this point.

[     0.000723s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

More logs here (incl. SRV resolution errors):

[     0.000290s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.000540s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on [::]:4191
[     0.000547s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on [::]:4143
[     0.000548s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on [::1]:4140
[     0.000548s]  INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[     0.000549s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.000550s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.000551s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[     0.000723s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.001205s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[     0.002552s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[     0.109229s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.311912s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.727744s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     1.229279s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     1.730070s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     2.230905s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     2.731792s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     3.232496s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

valorl avatar Jun 06 '22 10:06 valorl

Anyone help with above issue please ?

pmacieje avatar Jul 12 '22 17:07 pmacieje

@adleong Thank you for adding this to roadmap.

rkujawa avatar Aug 22 '22 13:08 rkujawa

Ok, I got some more logs by removing the postStart hook from the proxy. Seems like there are some places in the proxy with IPv4 (127.0.0.1) still assumed. Not sure if this is configurable, but I have no mention of 127.0.0.1 in my manifests at this point.

[     0.000723s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

More logs here (incl. SRV resolution errors):

[     0.000290s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.000540s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on [::]:4191
[     0.000547s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on [::]:4143
[     0.000548s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on [::1]:4140
[     0.000548s]  INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[     0.000549s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.000550s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.000551s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[     0.000723s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.001205s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[     0.002552s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[     0.109229s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[     0.311912s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

In the proxy, I found configuration variables that are selected during initialization in the absence of environmental variables.

Shoowa avatar Dec 05 '22 23:12 Shoowa

* A quick browse of the code highlights that there are a few places in the injector's proxy template where IPv4 addresses (127.0.0.1 and 0.0.0.0) are used to configure the proxy's listeners: https://github.com/linkerd/linkerd2/blob/86f56df4bcfebb35fd5313bcf5d56e6e299946da/charts/partials/templates/_proxy.tpl#L52-L59

FYI, I couldn't get linkerd to work by just changing the deployment values.

jceb avatar Jan 03 '23 14:01 jceb

I updated the deployment to use other IPv6 and other address formats. It looks like the first hurdle is fixing the linkerd2-proxy validation to accept IPv6 IPs or wildcard addresses. It seems to only want to accept an IPv4 address format.

kubectl logs -n linkerd linkerd-identity-6d97bbd6b5-ftc9g -c linkerd-proxy

[ 0.001785s] ERROR ThreadId(01) linkerd_app::env: Expected IP:PORT; found: localhost:4140 [ 0.001806s] ERROR ThreadId(01) linkerd_app::env: LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR="localhost:4140" is not valid: HostIsNotAnIpAddress

https://github.com/linkerd/linkerd2-proxy/blob/3933feb587c27f5f8c5c76669d9d99b44b0a60b2/linkerd/app/src/env.rs#L924-L932

luker31337 avatar May 22 '23 12:05 luker31337

To add to the list: the admin address (if you manually shutdown via the /shutdown endpoint during Jobs) only listens on the ipv4 address, so a curl/wget to "localhost" frequently fails as that defaults to ::1. It works if you manually specify 127.0.0.1, though.

Tried to make it run on IPv6 but just changing things in Helm chart is not enough. Issues I ran into included:

  • Bunch of IPv4-only listening addresses. Easiest one to fix but a lot of them cannot be customized and require controller image to be rebuilt (because proxy-injector uses Helm chart it was built with)

  • proxy-init only supports IPv4. Can be worked-around by replacing iptables binary with ip6 equivalent (but then you lose all IPv4 support).

  • proxy uses SO_ORIGINAL_DST socket opt which is IPv4 only. IPv6 has a different equivalent option IP6T_SO_ORIGINAL_DST which is not supported by the proxy.

eplightning avatar Oct 06 '23 16:10 eplightning

There are two more things that needed fixing. On top of the things I mentioned in my previous comment I was finally able to get Linkerd working inside IPv6-only EKS cluster:

  • destination assumes IPv4 when parsing authority: https://github.com/linkerd/linkerd2/blob/a6ea765d3992ab56c46bba9811921e02544532c5/controller/api/destination/server.go#L555
  • destination converts IPv6 addresses to IPv4 addresses resulting in not working resolution https://github.com/linkerd/linkerd2/blob/a6ea765d3992ab56c46bba9811921e02544532c5/controller/api/destination/endpoint_translator.go#L357

eplightning avatar Oct 06 '23 19:10 eplightning

Thanks to everyone who's been digging into this! To be clear, we are indeed planning to get IPv6 working, so all the information is great.

kflynn avatar Oct 07 '23 21:10 kflynn

We're also switching to IPv6 EKS clusters, and Linkerd is blocking our migration, as we really want to use IPv6 as well as Linkerd... Is there any word on a timeframe to have IPv6 support?

FrederikNJS avatar Dec 13 '23 15:12 FrederikNJS

@eplightning .... if you got this working, do you have a PR already submitted?

@kflynn Hello, I hope you are well. Does Linkerd already support IPv6?

Vinaum8 avatar Feb 29 '24 21:02 Vinaum8

Hi folks, I'm glad to report development on this front has already started! I'll report back here when there's something that you can test :-)

alpeb avatar Feb 29 '24 22:02 alpeb

@alpeb wow, thanks for this!!! you are amazing man hhahahah i want to test this in my clusters with linkerd, linkerd viz and multicluster features. i'm here for test and help you with i to can

Vinaum8 avatar Mar 01 '24 00:03 Vinaum8