linkerd2 External/egress policy and support for enforcing network security policies and compliance

Feature Request

Enable Linkerd policies to fully specify the external resources a cluster can access, a la egress gateways from Istio.

What problem are you trying to solve?

In high compliance environments, it's useful to be able to use the service mesh as part of the network security of a cluster.

Suppose I wish to have some pods interact with the GitHub API. We'll call this application our github-agent. Without using any third party tooling, those pods need port 443 outbound to be open. However, if a supply chain attack injects code into my cluster, they could phone home to any command and control (C2) node on port 443 to perform malicious acts.

A default deny policy in a namespace ensures that outbound network traffic must be authorized, but K8s network policies are layer 3 (IP address and protocol) based, and cannot enforce HTTPS or mitigate supply chain attacks such as Magecart.

A layer 7 proxy solution is needed to ensure our github-agent application can only access https://api.github.com with a valid certificate.

How should the problem be solved?

Either of the two methods seems sound to me:

Client-side egress filtering

Client-side profiles have an allowlist mode which acts as an additional layer of filtering that cannot be circumvented by application code (perhaps via the CNI Plugin and iptables rules?).

Pros: the compute and networking overhead here is already paid for by existing sidecar containers.

Cons: the NetworkPolicy that targets those pods will likely still need to allow port 443 egress to 0.0.0.0/0. (Is there an alternative?).
Namespace or cluster-wide egress gateway

A cluster-wide egress gateway is deployed by Linkerd which proxies requests to external services.

Pros: Only that egress gateway needs port 443 egress.

Cons: contrary to some of the goals of Linkerd, this creates an additional network hop and centralizes network traffic in the egress gateway(s)

Any alternatives you've considered?

In clusters I manage, I use a default deny to block all applications in the cluster from making external network requests. Then, I deploy:

A Service github-proxy which targets the pods managed by the same named deployment outbound...
A NetworkPolicy github-proxy-egress which allows the port 443 egress and certain labeled pods port 80 ingress to...

A Deployment github-proxy running nginx with a specific configuration:

http {
  upstream api.github.com {
    server api.github.com:443 max_fails=0;
  }

  server {
    # ...

    location / {
      # ...
      proxy_pass                    https://api.github.com/;
    }
  }
}

This is fairly high overhead, requiring deploying 3 Kubernetes resources and modifying application code. The latter especially isn't always possible. If github-agent were a third party application, it may hardcode requests and not be configurable to use a proxy.

How would users interact with this feature?

Knobs/controls I would expect on this resource for each desired hostname:

Optional ports and protocols (HTTP, HTTP/2, gRPC, and opaque TCP or UDP)
An optional flag to control whether this is an allowlist (Linkerd will close connections to not explicitly allowed hosts).
An optional flag to control whether TLS is enforced, for non-opaque protocols.
An optional flag to control whether telemetry is enabled, for non-opaque protocols
An optional field to determine what trace header propagation format to use, for future proofing. This ought to be None by default, to drop trace headers on outbound requests to avoid leaking information.) For cross-cluster communication, I would imagine this supporting different propagation formats in the future.

My opinions on defaults:

Protocol detection should be enabled, but it should be possible to define the protocol to constrain traffic, e.g.: I know my destination API only supports gRPC, do not allow general HTTPS traffic.
A non-matching hostname will be proxied as though it had default settings
TLS enforcement is opt-out unless the protocol is set to HTTP or opaque
Telemetry should be opt-out for supported protocols
Trace propagation should be opt-in to avoid leaking application details by default

The defaults therefore would result in Linkerd acting as a MITM for external resources and collecting telemetry, dropping trace headers on outbound requests.

For users with compliance needs, setting the allowlist flag would cause the sidecar to close connections to non-matching hostnames and protocols by default.

Jun 07 '21 18:06 AaronFriel

FYI some related work: https://github.com/grampelberg/k8s-egress

Jun 07 '21 20:06 adleong

@adleong excellent, that repo is very similar to what I'm doing.

I would rank the allowlist component of the feature (aka "block supply chain attacks" button) substantially higher than the ability to handle third party applications, but I do think this feature should be designed with a future in mind that supports adding that functionality down the road. There is one thing I forgot to mention, which is that many enterprise applications (the third party applications I mentioned above) do typically support HTTP_PROXY or HTTPS_PROXY env variables.

It looks like the Rust proxy may already work with the CONNECT method.

Jun 07 '21 20:06 AaronFriel

Other related work: Monzo's egress-operator

Nov 17 '21 19:11 mogul

@AaronFriel Sry, for hijacking this issue, but I am interested in your solution for this:

I use a default deny to block all applications in the cluster from making external network requests

I don't have any pod that needs to talk to the internet so I want to restrict all external network requests. Can I do this with LinkerD?

Mar 18 '22 15:03 jume-dev

@jume-dev It's been over a year since you wrote, but in the interest of readers: You cant limit all external network requests with LinkerD.

In the current state, you'd have to limit at the cluster networking layer (e.g. no routes to the outside).

Jun 08 '23 10:06 limoges

@jume-dev It's been over a year since you wrote, but in the interest of readers: You cant limit all external network requests with LinkerD.

In the current state, you'd have to limit at the cluster networking layer (e.g. no routes to the outside).

Is there any documentation on doing this?. I suspect it would be with network policy, but then you would still need linkerd to talk to it's components.

I think building your own egress pods makes sense, but at the moment I've struggled to find exactly what policies would work. You can trial and error it, but that might be kinda of tedious

Jun 23 '24 00:06 btrepp