linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Egress HTTPS Metrics

Open grampelberg opened this issue 5 years ago • 14 comments

What problem are you trying to solve?

The rich metrics that Linkerd provides are rarely available for third party services such as github.com because the communication is encrypted from the application all the way to the third party service. The proxy never sees the unencrypted bits.

There should be some solution that allows the proxy to inspect outbound traffic from an application to a third party (or anything outside the mesh), export metrics for that communication and apply policy via service profiles.

Requirements

  • Changes to application code is okay.
  • External service configuration is okay.
  • Must fail encrypted when not in the mesh.

Any alternatives you've considered?

  • Modify applications to use http instead of https and configure the proxy to upgrade the connection - this has the potential for applications that are not meshed to fail in an unencrypted fashion.
  • Add a trust root to the application's container and MITM the encrypted connection - this requires some potentially fragile modification to the application's controller.
  • Use kTLS - this requires support in the application's client.

grampelberg avatar Aug 02 '19 20:08 grampelberg

Related #2192.

grampelberg avatar Aug 02 '19 20:08 grampelberg

+1 for this

DavidZisky avatar Oct 30 '19 12:10 DavidZisky

Hey @grampelberg ! I am willing to get mentored for this issue for the Community Bridge program. Can you please help me in getting started with it?

Thanks

championshuttler avatar May 12 '20 23:05 championshuttler

Hey! I have a few doubts regarding this issue :) I think this is really interesting and would love to work on this in any capacity. I wanted to ask you that if we don't use say a traditional TLS or MITM approach (i.e we aren't decrypting the requests and forwarding them to the server). Does our approach kind of err on the side of guesswork, like say using the IP address of the outbound traffic and using a DNS to guess where the request is going and measuring the size of the query and response along with other metrics that would be available without decryption and knowledge of the exact contents of the request?

also does Must fail encrypted when not in the mesh. mean that all the nodes that haven't been meshed must not have encrypted traffic?

upon first glance, the alternatives look easier to implement, possibly why this issue looks so appealing haha :)

vaniisgh avatar May 13 '20 00:05 vaniisgh

First step would be to jump into slack. We've got a #contributors channel for all these questions =)

We'd like to get to know y'all a little first, so you'll want to do a couple contributions. Check out good first issue for a list of those.

After that, we'll want to get an RFC together. I'm happy to help you pull all the pieces together on that in slack =)

I wanted to ask you that if we don't use say a traditional TLS or MITM approach (i.e we aren't decrypting the requests and forwarding them to the server).

Applications will likely need to send us unencrypted traffic so that we can do analytics on it. The other option would be some of the eBPF functionality around optimistic TLS. I don't believe that's mature quite enough yet and wouldn't hit the proxy anyways. To get unencrypted traffic, we'll likely want to have special domain names and ask application owners to change their connection strings.

grampelberg avatar May 13 '20 22:05 grampelberg

There is a solution for this here: https://github.com/grampelberg/k8s-egress

wmorgan avatar Jun 30 '20 18:06 wmorgan

Our security team wasn't really open to the idea of a single (egress) service receiving all the unencrypted egress traffic of the cluster. I think they would be more willing to consider it if it was the sidecar proxy doing it and the client application were to send it to localhost (in case it's not meshed, it would fail).

m1o1 avatar Mar 28 '22 03:03 m1o1

Egress control at the pod level is on the roadmap. Won't be in 2.12 itself but perhaps in 2.13? https://linkerd.io/2021/12/29/the-service-mesh-in-2022/

wmorgan avatar Mar 28 '22 14:03 wmorgan

Hi there, i was just searching for exactly this feature in linkerd :P

but it sadly looks this feature is not included in 2.13 as this is already released?

Is this feature still on the roadmap and can we estimate a milestone for this?

Thanks

eloo-abi avatar Aug 22 '23 09:08 eloo-abi

Really need this feature to monitor the external traffic from the pod for observibility.

channyein87 avatar Sep 08 '23 22:09 channyein87

This is a very practical feature for us, It would be great to see it in the next coming versions.

xsoheilalizadeh avatar Feb 29 '24 12:02 xsoheilalizadeh

This is still on the roadmap, although we don't have a definitive milestone for it yet. 😐

If any of y'all have a solid concept of what you think this should look like, we'd love to hear it – e.g. are you thinking of command-line support in linkerd viz? simply having metrics posted to Prometheus? splitting out metrics by egress destination? ???

kflynn avatar Mar 07 '24 17:03 kflynn

My current challenge is analyzing outgoing traffic from the cluster how the pods are accessing external resources such as internet, database, etc. Also there isn't any control for allowing whitelisted DNS domains since builtin network policy cannot do that. Currently using VPC flow logs but not really reliable when investigating connectivity issues like there were a lot of timeouts from one of the workloads calling to external resource which is behind a NLB and unable to trace with available telemetry data. We had too many assumptions whether the workloads itself or cluster networking or NLB.

zip-chanko avatar Apr 25 '24 03:04 zip-chanko

Would be great to have visibility into amount of requests and success rates per external domain. Path level stats would be a great bonus

adrian-gierakowski avatar May 19 '24 22:05 adrian-gierakowski