linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Support disabling protocol detection for non-meshed destinations

Open JacobHenner opened this issue 2 years ago • 14 comments

What problem are you trying to solve?

There are two ways to disable protocol detection in linkerd:

Unfortunately, opaque ports cannot be set for non-meshed services. Affected ports can be skipped entirely, but that impacts meshed instances of the service that can use opaque ports and should not be skipped.

For example, if a pod needs to connect to both a meshed MySQL service (port 3306) and a non-meshed MySQL service (e.g. a managed offering), there is no way to disable protocol detection for the non-meshed connection while still connecting to the meshed service through the linkerd proxy.

See also: slack thread

How should the problem be solved?

Linkerd should support configuring the "no protocol detection" behavior of opaque ports for non-meshed connections, by port. Using the example above, I should be able to configure linkerd to skip protocol detection for connections to port 3306 on non-meshed destinations, without skipping port 3306 entirely. Protocol detection for in-mesh destinations would be disabled by the port being marked opaque.

Any alternatives you've considered?

The port number used by the out-of-mesh service could be configured to a non-standard value, and that port number could be excluded entirely without impact to the in-mesh service using the standard port number. But, this approach isn't always practical as not all teams can change the listening ports of the services they connect to.

How would users interact with this feature?

Users would add an annotation similar to config.linkerd.io/opaque-ports: ... to the relevant Kubernetes objects. Something like config.linkerd.io/nonmeshed-protocol-detection-disabled-ports: 3306,..., perhaps.

Would you like to work on this feature?

maybe

JacobHenner avatar Apr 15 '22 23:04 JacobHenner

Regarding the decision tree from the docs:

image

  • Wrapping in TLS is not a sufficient workaround either. Several of the default-opaque ports belong to applications which use a starttls scheme, and aren't immediately recognized as "wrapped in TLS" even if the eventual data exchange is.
  • It's unclear to me how "is the destination on the cluster" is evaluated. In my testing linkerd did not mark the port(s) as skip when the destination was an out-of-cluster managed service.
  • The port is in the list of opaque ports, but the list isn't consulted because opaque ports are limited to meshed services.

JacobHenner avatar Apr 15 '22 23:04 JacobHenner

The destination controller is already configured with a set of cluster-wide default opaque ports. This default should apply for all outbound connections (regardless of whether they are in-cluster or not). If it doesn't, we should fix that.

Would that be sufficient to solving your problem? Or do you really need this to be configurable on a per-pod basis? We plan to add richer client configurations in the 2.13-timeframe (we're working on 2.12 right now), but if a cluster-wide default can work for you, I'd rather test/fix the current configuration surface area.

olix0r avatar Apr 16 '22 17:04 olix0r

Ah, OK. So the destination controller does this as we'd hope, but proxies are configured by default to avoid resolving configuration for IP addresses outside of the clusterNetworks configuration (which only includes private IP space by default). We probably want to investigate changing this so that proxies resolve this configuration for all addresses.

In the meantime, you can set the config.linkerd.io/enable-external-profiles: "true" annotation on workloads, which should have the same effect.

olix0r avatar Apr 16 '22 17:04 olix0r

In the meantime, you can set the config.linkerd.io/enable-external-profiles: "true" annotation on workloads, which should have the same effect.

Thanks for looking into this, I'll give this a try. However, regarding the following:

the destination controller does this as we'd hope, but proxies are configured by default to avoid resolving configuration for IP addresses outside of the clusterNetworks configuration (which only includes private IP space by default).

In the situation I was testing, the out-of-cluster services actually exist within the same private address space as the cluster. If the destination controller only uses those static private IP space CIDRs to determine whether a service is external, there might be some other contributing factor. I'll provide an update after testing.

When I was looking through the proxy's source the other day, I interpreted this section as skipping opaque transport if mTLS is not in use. Have I misunderstood? (I'm still working to internalize the middleware model used by the proxy)

JacobHenner avatar Apr 16 '22 20:04 JacobHenner

I'll provide an update after testing.

I tested by adding config.linkerd.io/enable-external-profiles: "true" to a workload that connects to an out-of-cluster MySQL service that's within the same private address space. I verified the environment vars on the injected proxy container included 3306 (the port being used) as a default-opaque port. The proxy logs still showed attempts to run protocol detection (along with timeouts) on connections to this MySQL service.

JacobHenner avatar Apr 16 '22 20:04 JacobHenner

When I was looking through the proxy's source the other day, I interpreted this section as skipping opaque transport if mTLS is not in use. Have I misunderstood?

This section determines whether "opaque transport" is used, which is related to protocol detection but is a slightly different thing. Opaque transport applies when two proxies are transporting a connection that was already marked as opaque so that they can apply mTLS (by initiating connections to an alternate port and adding a header onto the connection that includes the original destination port).

Instead, we probably want to look at things "higher up" the stack:

I know this is all a bit heady...

If you run a proxy with config.linkerd.io/proxy-log-level: linkerd=debug,info you should get some more detailed information. If you're able to confirm that a profile is actually being resolved from the control plane, that would probably be a helpful starting point.

olix0r avatar Apr 16 '22 23:04 olix0r

config.linkerd.io/enable-external-profiles: "true" did not solve the issue - is there additional configuration required to get the proxies to recognize the non-meshed IP as an external service?

I enabled debug logging for the proxy and the destination service. I see the following:

Destination service:

time="2022-04-18T14:20:12Z" level=debug msg="GetProfile(path:\"10.12.0.7:3306\" context_token:\"{\\\"ns\\\":\\\"redacted\\\", \\\"nodeName\\\":\\\"ip-100-64-104-240.ec2.internal\\\"}\\n\")" addr=":8086" component=server remote="100.64.102.124:52518"
time="2022-04-18T14:20:12Z" level=debug msg="no pod found for 10.12.0.7:3306" addr=":8086" component=server remote="100.64.102.124:52518"
time="2022-04-18T14:20:12Z" level=debug msg="GetProfile(path:\"10.12.0.7:3306\" context_token:\"{\\\"ns\\\":\\\"redacted\\\", \\\"nodeName\\\":\\\"ip-100-64-104-240.ec2.internal\\\"}\\n\") cancelled" addr=":8086" component=server remote="100.64.102.124:52518"

Proxy (note the timeout is still occurring):

[    96.594349s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}: linkerd_cache: Caching new service target=Accept { orig_dst: OrigDstAddr(10.12.0.7:3306), protocol: () }
[    96.594490s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile: linkerd_app_outbound::discover: Allowing profile lookup
[    96.594564s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}: linkerd_stack::failfast: TCP Server service has become unavailable
[    96.594605s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_dns: resolve_srv name=linkerd-dst-headless.linkerd.svc.cluster.local.
[    96.596165s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_dns: ttl=2.99998444s addrs=[100.64.102.124:8086, 100.64.108.130:8086, 100.64.114.52:8086]
[    96.596191s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_proxy_dns_resolve: addrs=[100.64.102.124:8086, 100.64.108.130:8086, 100.64.114.52:8086] name=linkerd-dst-headless.linkerd.svc.cluster.local:8086
[    96.596228s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_proxy_discover::from_resolve: Changed change=Insert(100.64.102.124:8086, Target { addr: 100.64.102.124:8086, server_id: Some(ClientTls { server_id: ServerId(Name("linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local")), alpn: None }) })
[    96.596268s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_proxy_discover::from_resolve: Changed change=Insert(100.64.108.130:8086, Target { addr: 100.64.108.130:8086, server_id: Some(ClientTls { server_id: ServerId(Name("linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local")), alpn: None }) })
[    96.596296s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_proxy_discover::from_resolve: Changed change=Insert(100.64.114.52:8086, Target { addr: 100.64.114.52:8086, server_id: Some(ClientTls { server_id: ServerId(Name("linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local")), alpn: None }) })
[    96.596325s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.102.124:8086}: linkerd_reconnect: Disconnected backoff=false
[    96.596331s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.102.124:8086}: linkerd_reconnect: Creating service backoff=false
[    96.596339s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.102.124:8086}: linkerd_proxy_transport::connect: Connecting server.addr=100.64.102.124:8086
[    96.596470s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.108.130:8086}: linkerd_reconnect: Disconnected backoff=false
[    96.596489s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.108.130:8086}: linkerd_reconnect: Creating service backoff=false
[    96.596494s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.108.130:8086}: linkerd_proxy_transport::connect: Connecting server.addr=100.64.108.130:8086
[    96.596573s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.114.52:8086}: linkerd_reconnect: Disconnected backoff=false
[    96.596594s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.114.52:8086}: linkerd_reconnect: Creating service backoff=false
[    96.596599s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.114.52:8086}: linkerd_proxy_transport::connect: Connecting server.addr=100.64.114.52:8086
[    96.596832s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.102.124:8086}:h2: linkerd_proxy_transport::connect: Connected local.addr=100.64.2.2:56214 keepalive=Some(10s)
[    96.596978s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.108.130:8086}:h2: linkerd_proxy_transport::connect: Connected local.addr=100.64.2.2:56138 keepalive=Some(10s)
[    96.598428s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.102.124:8086}:h2: linkerd_tls::client:
[    96.598571s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.102.124:8086}: linkerd_reconnect: Connected
[    96.598736s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.114.52:8086}:h2: linkerd_proxy_transport::connect: Connected local.addr=100.64.2.2:34842 keepalive=Some(10s)
[    96.599579s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.108.130:8086}:h2: linkerd_tls::client:
[    96.602069s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=100.64.114.52:8086}:h2: linkerd_tls::client:
[    99.597395s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_dns: resolve_srv name=linkerd-dst-headless.linkerd.svc.cluster.local.
[    99.598965s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_dns: ttl=4.99998473s addrs=[100.64.102.124:8086, 100.64.108.130:8086, 100.64.114.52:8086]
[    99.598990s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_proxy_dns_resolve: addrs=[100.64.102.124:8086, 100.64.108.130:8086, 100.64.114.52:8086] name=linkerd-dst-headless.linkerd.svc.cluster.local:8086
[   104.600913s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_dns: resolve_srv name=linkerd-dst-headless.linkerd.svc.cluster.local.
[   104.602445s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_dns: ttl=4.99998873s addrs=[100.64.102.124:8086, 100.64.108.130:8086, 100.64.114.52:8086]
[   104.602478s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_proxy_dns_resolve: addrs=[100.64.102.124:8086, 100.64.108.130:8086, 100.64.114.52:8086] name=linkerd-dst-headless.linkerd.svc.cluster.local:8086
[   106.600757s]  INFO ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile: linkerd_detect: Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s
[   106.600812s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:tcp.forward: linkerd_tls::client: Peer does not support TLS reason=not_provided_by_service_discovery
[   106.600821s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:tcp.forward: linkerd_proxy_transport::connect: Connecting server.addr=10.12.0.7:3306
[   106.601703s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:tcp.forward: linkerd_proxy_transport::connect: Connected local.addr=100.64.2.2:51818 keepalive=Some(10s)
[   106.601740s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.2.2:51698}:server{orig_dst=10.12.0.7:3306}:profile:tcp.forward: linkerd_transport_metrics::client: client connection open

With @mateiidavid 's guidance I've run go run controller/script/destination-client/main.go -method getProfile -path 10.12.0.7:3306. The output suggests that the destination service is informing the proxy that the destination is opaque:

INFO[0000] opaque_protocol:true  retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}

I'll continue to investigate.

JacobHenner avatar Apr 18 '22 14:04 JacobHenner

I've put up a change to add some more debug logging to help us narrow this down.

You can try it by running your workload with pod annotations:

config.linkerd.io/proxy-image: ghcr.io/olix0r/l2-proxy
config.linkerd.io/proxy-version: opaque.f617d8fb

olix0r avatar Apr 18 '22 16:04 olix0r

The relevant new loglines are:

[    69.191838s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.57.205:47896}:server{orig_dst=10.12.0.7:3306}:profile: linkerd_app_outbound::switch_logical: No profile; forwarding to the original destination
[    69.191857s] DEBUG ThreadId(01) outbound:accept{client.addr=100.64.57.205:47896}:server{orig_dst=10.12.0.7:3306}:profile: linkerd_app_outbound::http::detect: Attempting HTTP protocol detection

JacobHenner avatar Apr 18 '22 17:04 JacobHenner

@JacobHenner OK, I think I've tracked down the issue and should have a fix in config.linkerd.io/proxy-version: opaque.ddf0ce28 -- i'm getting tests together and will try to include this in the upcoming 2.11.2 release.

olix0r avatar Apr 19 '22 15:04 olix0r

@JacobHenner OK, I think I've tracked down the issue and should have a fix in config.linkerd.io/proxy-version: opaque.ddf0ce28 -- i'm getting tests together and will try to include this in the upcoming 2.11.2 release.

Great, thank you for addressing this!

I've just tested this image, and I can confirm it exhibits the desired behavior. My test was limited to a connection to an out-of-mesh service on port 3306. I did not test for any regressions.

Thankfully, the default opaque ports are sufficient for our purposes. However, the subject used when opening this issue is a bit more broad. Since I can imagine that other users might need to disable protocol detection for non-default-opaque ports, should this issue remain open once https://github.com/linkerd/linkerd2-proxy/pull/1617 is merged, or should a separate issue be opened to track potential enhancements?

Thanks again!

JacobHenner avatar Apr 19 '22 17:04 JacobHenner

@JacobHenner Great. I'll leave this open for now, though it may stale out. I think this will slot into already-(loosely-)planned egress policy/configuration work, sketched in for 2.13.

olix0r avatar Apr 19 '22 20:04 olix0r

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 19 '22 01:07 stale[bot]

Removing the wontfix on this, but this is something that already can be configured — albeit in a cluster-wide way. When installing Linkerd, users can change the default opaque ports with --set opaquePorts="..." so that un-meshed workloads that are running a non-default opaque port do not result in a protocol detection timeout.

Ideally it would be nice if this could be set on a per-workload basis from the client instead of having to change a cluster-wide configuration. As already mentioned in the original description, this could be handled by an annotation.

kleimkuhler avatar Jul 20 '22 23:07 kleimkuhler

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 19 '22 16:10 stale[bot]