linkerd2
linkerd2 copied to clipboard
IPv6 Support
What problem are you trying to solve?
Now that Kubernetes supports IPv6, Linkerd2 should as well.
This is becoming a serious problem for us, now that we started deploying EKS clusters with IPv6 support...
This is becoming a serious problem for us
Can you elaborate on the problems you're encountering? What parts of Linkerd don't work with IPv6?
Hi, according to @rkujawa message I attache below some details about our IPv6 issue with linkerd:
While linkerd starting on IPv6 k8s cluster linkerp-proxy have problem with connection to linkerd-identity (for linkerd-destination and linkerd-proxy-injector the same issue):
Readiness probe failed: Get "http://[2a05:d011:9fd:ae44:3db5::c]:4191/ready": dial tcp [2a05:d011:9fd:ae44:3db5::c]:4191: connect: connection refused
Liveness probe failed: Get "http://[2a05:d011:9fd:ae44:3db5::c]:4191/live": dial tcp [2a05:d011:9fd:ae44:3db5::c]:4191: connect: connection refused
Tested on linkerd version stable-2.11.1 and edge-22.2.4.
I don't currently have bandwidth to dig into this deeply, but I'll leave some notes here in case someone else has time to investigate:
- There's nothing (as far as I know) in Linkerd that is inherently IPv4-specific. All of the discovery APIs, etc already handle IPv6 addresses.
- A quick browse of the code highlights that there are a few places in the injector's proxy template where IPv4 addresses (127.0.0.1 and 0.0.0.0) are used to configure the proxy's listeners: https://github.com/linkerd/linkerd2/blob/86f56df4bcfebb35fd5313bcf5d56e6e299946da/charts/partials/templates/_proxy.tpl#L52-L59 I'd probably start by manually modifying the generated YAML to use IPv6 addresses instead to see if that produces a working control plane. We'll probably need a way to make these addresses configurable.
- We'll need to get an integration test setup that configures a k3d (or kind) cluster with IPv6.
Hi, any progress in this issue ? Support for IPv6 is realy needed for us.
@pmacieje I'd look at the comment posted above. As mentioned, the controller supports IPv6 addresses. You can try to test everything by changing the manifests on install to configure proxy listeners with IPv6 counterparts. If there are any issues, please let us know through an issue so we can attempt to fix them.
While it's not clear to me whether k3d supports IPv6 yet, it appears that k3s does. So I suspect it's now possible for us to set up IPv6-only clusters for our integration tests.
While looking at some unrelated changes, I noticed that we are probably going to have to update the proxy's DNS resolution to support IPv6:
- SRV resolution will have to be smarter. We currently parse IPv4 addresses from SRV responses. This will have to be updated to handle IPv6 pods.
- When SRV lookups fail, we fall back to A record lookups. These lookups will have to be augmented with AAAA record lookups.
If anyone wants to help out with this effort, I think the next step is figuring out how to boot up a k3d cluster with an IPv6 pod network so that we can start setting up a reproducible integration test.
I tried this out and ran linkerd check
but it failed.
linkerd-init
init container seems to not set up iptables rules for ipv6 properly. It only runs iptables
but not ip6tables
also linkerd-proxy
prints the following WARNings:
[ 149.677120s] WARN ThreadId(01) policy:watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 149.749911s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=[2a05:f480:1800:27d:63dd:4e0:e8e:d40a]:8080}: linkerd_reconnect: Failed to connect error=received corrupt message
and:
Normal Killing 34s (x2 over 3m6s) kubelet FailedPostStartHook
Warning FailedPostStartHook 34s (x2 over 3m6s) kubelet Exec lifecycle hook ([/usr/lib/linkerd/linkerd-await --timeout=2m]) for Container "linkerd-proxy" in Pod "linkerd-destination-5f8f4db6-plj75_linkerd(4e52504b-2198-41f0-b324-2b303e963b70)" failed - error: command '/usr/lib/linkerd/linkerd-await --timeout=2m' exited with 69: linkerd-proxy failed to become ready within 120s timeout
I suppose this is the issue @olix0r is referring to
So something outside of the helm template is still trying to do something with 127.0.0.1
and also the iptables rules are definitely an issue
Thanks for trying that, @arianvp. I suspect the 127.0.0.1
is coming from the deployment manifest. You could probably try manually editing it to replace instances of 127.0.0.1 and 0.0.0.0. I'm not sure how much farther that will get you, though.
I'll see if I can convince k3d to set up an ipv6-only cluster this weekend and open up a PR
I was curious to try this too as we'd like to run linkerd on IPv6 EKS clusters. I've tried with an IPv6 kind
cluster (super easy to setup). Didn't get too far yet but thought I'd share the following:
linkerd-init init container seems to not set up iptables rules for ipv6 properly. It only runs iptables but not ip6tables
This is correct. I've tried to make a custom-built proxy-init image that uses ip6tables
instead (requires /lib/modules
to be mounted as hostPath).
Despite the init completing successfully, linkerd-destination and linkerd-proxy-injector are stuck in PodInitializing (due to identity not running? Can't confirm, no logs in proxies).
Linkerd-identity is not in PodInitializing. The proxy container actually starts and is Running, but is reporting SRV resolution errors. The identity controller container is in a CrashLoopBackoff with only the following lines logged:
time="2022-06-05T23:21:21Z" level=info msg="starting admin server on :9990"
time="2022-06-05T23:21:21Z" level=info msg="starting gRPC server on :8080"
time="2022-06-05T23:21:51Z" level=info msg="shutting down gRPC server on :8080"
Perhaps this helps a little :) I might have a further look at some point. I'd welcome some tips on how to get some more data about what's going wrong (log levels, tracing etc) or just what to try next to get some momentum on this issue.
Ok, I got some more logs by removing the postStart hook from the proxy. Seems like there are some places in the proxy with IPv4 (127.0.0.1) still assumed. Not sure if this is configurable, but I have no mention of 127.0.0.1 in my manifests at this point.
[ 0.000723s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
More logs here (incl. SRV resolution errors):
[ 0.000290s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[ 0.000540s] INFO ThreadId(01) linkerd2_proxy: Admin interface on [::]:4191
[ 0.000547s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on [::]:4143
[ 0.000548s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on [::1]:4140
[ 0.000548s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[ 0.000549s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[ 0.000550s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.000551s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[ 0.000723s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 0.001205s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[ 0.002552s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN
[ 0.109229s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 0.311912s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 0.727744s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 1.229279s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 1.730070s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 2.230905s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 2.731792s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[ 3.232496s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
Anyone help with above issue please ?
@adleong Thank you for adding this to roadmap.
Ok, I got some more logs by removing the postStart hook from the proxy. Seems like there are some places in the proxy with IPv4 (127.0.0.1) still assumed. Not sure if this is configurable, but I have no mention of 127.0.0.1 in my manifests at this point.
[ 0.000723s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
More logs here (incl. SRV resolution errors):
[ 0.000290s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.000540s] INFO ThreadId(01) linkerd2_proxy: Admin interface on [::]:4191 [ 0.000547s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on [::]:4143 [ 0.000548s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on [::1]:4140 [ 0.000548s] INFO ThreadId(01) linkerd2_proxy: Tap DISABLED [ 0.000549s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.000550s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.000551s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 0.000723s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.001205s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN [ 0.002552s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=no record found for name: linkerd-identity-headless.linkerd.svc.cluster.local. type: SRV class: IN [ 0.109229s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111) [ 0.311912s] WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
In the proxy, I found configuration variables that are selected during initialization in the absence of environmental variables.
* A quick browse of the code highlights that there are a few places in the injector's proxy template where IPv4 addresses (127.0.0.1 and 0.0.0.0) are used to configure the proxy's listeners: https://github.com/linkerd/linkerd2/blob/86f56df4bcfebb35fd5313bcf5d56e6e299946da/charts/partials/templates/_proxy.tpl#L52-L59
FYI, I couldn't get linkerd to work by just changing the deployment values.
I updated the deployment to use other IPv6 and other address formats. It looks like the first hurdle is fixing the linkerd2-proxy validation to accept IPv6 IPs or wildcard addresses. It seems to only want to accept an IPv4 address format.
kubectl logs -n linkerd linkerd-identity-6d97bbd6b5-ftc9g -c linkerd-proxy
[ 0.001785s] ERROR ThreadId(01) linkerd_app::env: Expected IP:PORT; found: localhost:4140 [ 0.001806s] ERROR ThreadId(01) linkerd_app::env: LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR="localhost:4140" is not valid: HostIsNotAnIpAddress
https://github.com/linkerd/linkerd2-proxy/blob/3933feb587c27f5f8c5c76669d9d99b44b0a60b2/linkerd/app/src/env.rs#L924-L932
To add to the list: the admin address (if you manually shutdown via the /shutdown endpoint during Jobs) only listens on the ipv4 address, so a curl/wget to "localhost" frequently fails as that defaults to ::1. It works if you manually specify 127.0.0.1, though.
Tried to make it run on IPv6 but just changing things in Helm chart is not enough. Issues I ran into included:
-
Bunch of IPv4-only listening addresses. Easiest one to fix but a lot of them cannot be customized and require
controller
image to be rebuilt (because proxy-injector uses Helm chart it was built with) -
proxy-init
only supports IPv4. Can be worked-around by replacing iptables binary with ip6 equivalent (but then you lose all IPv4 support). -
proxy
usesSO_ORIGINAL_DST
socket opt which is IPv4 only. IPv6 has a different equivalent optionIP6T_SO_ORIGINAL_DST
which is not supported by the proxy.
There are two more things that needed fixing. On top of the things I mentioned in my previous comment I was finally able to get Linkerd working inside IPv6-only EKS cluster:
-
destination
assumes IPv4 when parsing authority: https://github.com/linkerd/linkerd2/blob/a6ea765d3992ab56c46bba9811921e02544532c5/controller/api/destination/server.go#L555 -
destination
converts IPv6 addresses to IPv4 addresses resulting in not working resolution https://github.com/linkerd/linkerd2/blob/a6ea765d3992ab56c46bba9811921e02544532c5/controller/api/destination/endpoint_translator.go#L357
Thanks to everyone who's been digging into this! To be clear, we are indeed planning to get IPv6 working, so all the information is great.
We're also switching to IPv6 EKS clusters, and Linkerd is blocking our migration, as we really want to use IPv6 as well as Linkerd... Is there any word on a timeframe to have IPv6 support?
@eplightning .... if you got this working, do you have a PR already submitted?
@kflynn Hello, I hope you are well. Does Linkerd already support IPv6?
Hi folks, I'm glad to report development on this front has already started! I'll report back here when there's something that you can test :-)
@alpeb wow, thanks for this!!! you are amazing man hhahahah i want to test this in my clusters with linkerd, linkerd viz and multicluster features. i'm here for test and help you with i to can