gateway-api GEP: Client Certificate Verification for Gateway Listeners

What would you like to be added:

The ability for a HTTPS (or TLS generally) endpoint to require that the client present a certificate that can be validated according to some configurable policy.

Why is this needed:

As an application developer, I want to restrict access to my application to a certain audience of clients. The audience is defined by one or more of

a collection of specific TLS certificates (maybe by hash)
a collection of subject names in certificates
a collection of certificates issued by a specific (unique) CA

I want the infrastructure to guarantee that I only receive client traffic that originates from this audience.

/kind user-story

Feb 14 '20 02:02 jpeach

Is this mTLS? Can you add to the issue title, as that is a commonly known term around the K8S community.

Feb 14 '20 07:02 bowei

It is mutual TLS in the sense that the client and the server perform mutual TLS, but 'mTLS' also has some connotations about speed and automation of certificate rotation that we probably don't want to bring in here.

I say this because when we had an issue to add 'mTLS' to Contour, people started asking straight away if Contour was now a service mesh. Which it is not. But the term is overloaded.

Feb 17 '20 00:02 youngnick

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

May 17 '20 00:05 fejta-bot

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Aug 17 '20 07:08 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Sep 16 '20 08:09 fejta-bot

/remove-lifecycle rotten

#268 should close this.

Sep 16 '20 15:09 hbagdi

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Dec 15 '20 15:12 fejta-bot

/remove-lifecycle stale /lifecycle frozen

Dec 15 '20 16:12 hbagdi

Adding my comments from Slack regarding some configuration options that would be nice:

either deny or don't deny requests from clients who don't present valid certs (useful when you want to show a "logged out state" based on application logic in your upstream)

an option to pass through the subject/sans from the certificate to your upstream (the identity of the client), maybe in a header? not sure if there's a standard there. this could probably be the default.. not sure if it would matter if you still passed through the client identity and the upstream didn't care.

this could let you configure things like "i'll take requests in my upstream even if they don't present a valid cert, but if they do present a valid cert, i want to know that the identity was alan so i can decide his access level. and if there's no cert, i want to show them logged out state."

The ingress-nginx annotations seem like a really good reference point for which configuration options might be nice, as well as the names and contents of headers to pass back to the upstream:

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#client-certificate-authentication

I think this one largely gets at some of the use cases I was describing above:

nginx.ingress.kubernetes.io/auth-tls-verify-client: optional_no_ca

Mar 04 '21 23:03 alanchrt

It's also important to remember that, if the following are true:

you have a single proxy routing multiple SNIs
the multiple SNI routes can have different security profiles (as in, some can have client certificates and some can't)
You are using a protocol that has a higher-level routing construct (like HTTP's Host header)

Then you can connect to insecure.foo.com without a client cert, and set your Host header to secure.foo.com, and end up at the secure site, with no client cert.

This is roughly the same thing as domain fronting.

Mar 04 '21 23:03 youngnick

I wonder if sane defaults plus an "I know I'm doing a dangerous thing" option could help with domain fronting.

That is, if your listener is configured in Terminate mode, client certificate authentication can be configured with no special options. In addition to verifying the certificate, requests by default won't be routed upstream unless the SNI host matches the HTTP Host.

But, if you're in Passthrough mode, client authentication is only allowed with disableHostVerify: true. The docs can indicate that any upstream routers will need to be sure not to route based on Host with the expectation that it is the actually hostname that the client authenticated with. The disableHostVerify option could also be available for Terminate mode if there was a use case that needed it.

Mar 05 '21 16:03 alanchrt

@alanchrt you're talking about two modes: one that enforces that Host matches SNI and the other that doesn't, right? We're not terminating TLS vs. passing it through for a backend to terminate? If that's correct, I like the idea but would probably look for different terminology ;).

I'm not convinced the non-enforcing mode needs to exist at all. What's the use case for upstreams actually needing to route based on Host header? I'd consider setting the Host header for the upstream request to force it to match SNI. The host header is ignored all the time (e.g., most non-proxy web applications will ignore the host header). In a case like this, I can't think of a way it would ever break non-malicious clients.

I don't have as much experience with http/2 so I have no comment on whether this might affect those clients.

Mar 05 '21 21:03 mmalone

I don't have as much experience with http/2 so I have no comment on whether this might affect those clients.

I think per the HTTP2 spec servers are expected to return a 421 here in some scenarios - this allows the browser to automatically retry when its optimizations cause it to not match https://github.com/envoyproxy/envoy/issues/6767 and https://github.com/istio/istio/issues/13589 have a lot of info here

Mar 05 '21 21:03 howardjohn

@mmalone Actually, I wasn't trying to suggest Terminate and Passthrough as names for host verification, but rather trying to describe some possible default behaviors for those (existing) TLS mode options, which are defined in the TLS config.

Enforcement of SNI host and HTTP Host match wouldn't be possible in a Passthrough scenario, which is why I was suggesting you'd have to explicitly disable the enforcement to be able to enable client auth. Then, people know what they're getting themselves into.

As an example of an upstream routing based on Host, imagine it wasn't even a router usptream, but rather a monolith web app that's multi-tenant. It could serve customer-a.mycompany.com and customer-b.mycompany.com.

This is totally contrived, but imagine they decided they wanted to have cient cert auth on Gateway to prevent Customer A from even being able to load Customer B's app at all. They already have all the wiring to terminate TLS and manage certs and keys for their upstream app (for the customer domains), so they don't want to mess with that. They just verify the client cert, then passthrough TLS traffic to the app upstream. But, the app uses the HTTP Host to determine which customer's app to display.

Again, contrived, but I could see scenarios where someone might want it. In the above scenario, they could set disableHostVerify: true, and they could check the SNI host (passed in a header) to make sure it matches the HTTP Host in the upstream app itself.

Mar 05 '21 22:03 alanchrt

Despite this issue being quite old, we the maintainers are still pretty convinced that we want to have this functionality in a future release. We are marking this help wanted as we're looking for contributors with strong use cases to help champion and drive this forward.

Jul 21 '22 21:07 shaneutt

I think this is related to https://github.com/kubernetes-sigs/gateway-api/discussions/1244 and therefore GEP https://github.com/kubernetes-sigs/gateway-api/issues/1282 might be relevant / close the loop here.

Jul 28 '22 17:07 evankanderson

It's likely this may block future Knative adoption of Gateway API until it's resolved -- we're moving to re-encrypt traffic from the ingress between our components to the user's pod.

Jul 28 '22 18:07 evankanderson

It's related in that it's TLS, but this is for client certificate verification on the Listener, so on the connection from the client to the "outside" of the Gateway. #1244 and #1282 are both about the connection from the Gateway to the backend, "inside" the Gateway. (The descriptions are in quotes because they're not always the case, there's more to it than that, etc).

Aug 01 '22 10:08 youngnick

Hoping to help out with this issue.

Existing Projects

Here's an example of how mTLS is described to the user in Istio and how the user can configure their Gateway to support mTLS (by setting the tls.mode to MUTUAL and specifying the CA Certificate as cacert in the same secret where the server key and cert are defined)
Here's another example from Contour that asks the user to define a clientValidation block where the secret containing the CA Certificate can be defined (the secret must be a Opaque secret with the name ca.cert)

What Common between them

Explicitly define client validation along with TLS Termination (MUTUAL / clientValidation)
Provide a way to plug in the CA Certificate

Current State in the Gateway API

Has a mode field to define TLS session type
has the ability to define Certificate as secrets of type tls with CertificateRefs

Follow up questions for the community

Should mTLS reuse the Terminate mode or should there be a a new mode defined for mTLS
How does the user plug in the CA Cert
- In its current state, we cannot reuse CertificateRefs because tls Secrets dont support a CA Cert field
Assumed that the API might not want to expose more knobs such allowing specific SAN Names (this is what Envoy exposes), please correct me if that is not the case

Oct 21 '22 21:10 arkodg

@shaneutt I'd like to work on this one

Mar 10 '23 19:03 arkodg

You got it @arkodg, thank you for volunteering! I would definitely recommend the "What and why, but not the how yet" approach to starting an initial GEP here so we can get aligned on what the problem is and what our goals are. Let us know if there's any help you need, and how we can support you in this effort! :vulcan_salute:

/assign @arkodg

NOTE: I'm moving this into the v1.0.0 milestone as it appears we have someone who's going to move it forward. This doesn't mean we'll necessarily hold up v1.0.0 for it, but let's see where we can get with this in the coming weeks and hopefully it just falls right in.

Mar 10 '23 19:03 shaneutt

@arkodg wanted to check in on this one?

May 18 '23 17:05 shaneutt

thanks for the reminder @shaneutt, still plan on working on it

May 22 '23 18:05 arkodg

Sounds good, let us know if there's ways we can help facilitate :vulcan_salute:

May 22 '23 20:05 shaneutt

/reopen

Sep 15 '23 18:09 arkodg

@arkodg: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 15 '23 18:09 k8s-ci-robot

/kind gep

Dec 01 '23 23:12 robscott

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 03 '24 14:03 k8s-triage-robot

/remove-lifecycle stale /lifecycle frozen

Since the GEP is still in motion.

Mar 04 '24 23:03 youngnick

@arkodg are there conformance tests planned?

Apr 12 '24 16:04 howardjohn

gateway-api gateway-api copied to clipboard

GEP: Client Certificate Verification for Gateway Listeners

gateway-api
gateway-api copied to clipboard