gateway-api
gateway-api copied to clipboard
GEP: Client Certificate Verification for Gateway Listeners
What would you like to be added:
The ability for a HTTPS (or TLS generally) endpoint to require that the client present a certificate that can be validated according to some configurable policy.
Why is this needed:
As an application developer, I want to restrict access to my application to a certain audience of clients. The audience is defined by one or more of
- a collection of specific TLS certificates (maybe by hash)
- a collection of subject names in certificates
- a collection of certificates issued by a specific (unique) CA
I want the infrastructure to guarantee that I only receive client traffic that originates from this audience.
/kind user-story
Is this mTLS? Can you add to the issue title, as that is a commonly known term around the K8S community.
It is mutual TLS in the sense that the client and the server perform mutual TLS, but 'mTLS' also has some connotations about speed and automation of certificate rotation that we probably don't want to bring in here.
I say this because when we had an issue to add 'mTLS' to Contour, people started asking straight away if Contour was now a service mesh. Which it is not. But the term is overloaded.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
/remove-lifecycle rotten
#268 should close this.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale /lifecycle frozen
Adding my comments from Slack regarding some configuration options that would be nice:
either deny or don't deny requests from clients who don't present valid certs (useful when you want to show a "logged out state" based on application logic in your upstream)
an option to pass through the subject/sans from the certificate to your upstream (the identity of the client), maybe in a header? not sure if there's a standard there. this could probably be the default.. not sure if it would matter if you still passed through the client identity and the upstream didn't care.
this could let you configure things like "i'll take requests in my upstream even if they don't present a valid cert, but if they do present a valid cert, i want to know that the identity was alan so i can decide his access level. and if there's no cert, i want to show them logged out state."
The ingress-nginx
annotations seem like a really good reference point for which configuration options might be nice, as well as the names and contents of headers to pass back to the upstream:
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#client-certificate-authentication
I think this one largely gets at some of the use cases I was describing above:
nginx.ingress.kubernetes.io/auth-tls-verify-client: optional_no_ca
It's also important to remember that, if the following are true:
- you have a single proxy routing multiple SNIs
- the multiple SNI routes can have different security profiles (as in, some can have client certificates and some can't)
- You are using a protocol that has a higher-level routing construct (like HTTP's
Host
header)
Then you can connect to insecure.foo.com
without a client cert, and set your Host
header to secure.foo.com
, and end up at the secure site, with no client cert.
This is roughly the same thing as domain fronting.
I wonder if sane defaults plus an "I know I'm doing a dangerous thing" option could help with domain fronting.
That is, if your listener is configured in Terminate
mode, client certificate authentication can be configured with no special options. In addition to verifying the certificate, requests by default won't be routed upstream unless the SNI host matches the HTTP Host
.
But, if you're in Passthrough
mode, client authentication is only allowed with disableHostVerify: true
. The docs can indicate that any upstream routers will need to be sure not to route based on Host
with the expectation that it is the actually hostname that the client authenticated with. The disableHostVerify
option could also be available for Terminate
mode if there was a use case that needed it.
@alanchrt you're talking about two modes: one that enforces that Host matches SNI and the other that doesn't, right? We're not terminating TLS vs. passing it through for a backend to terminate? If that's correct, I like the idea but would probably look for different terminology ;).
I'm not convinced the non-enforcing mode needs to exist at all. What's the use case for upstreams actually needing to route based on Host header? I'd consider setting the Host header for the upstream request to force it to match SNI. The host header is ignored all the time (e.g., most non-proxy web applications will ignore the host header). In a case like this, I can't think of a way it would ever break non-malicious clients.
I don't have as much experience with http/2 so I have no comment on whether this might affect those clients.
I don't have as much experience with http/2 so I have no comment on whether this might affect those clients.
I think per the HTTP2 spec servers are expected to return a 421 here in some scenarios - this allows the browser to automatically retry when its optimizations cause it to not match https://github.com/envoyproxy/envoy/issues/6767 and https://github.com/istio/istio/issues/13589 have a lot of info here
@mmalone Actually, I wasn't trying to suggest Terminate
and Passthrough
as names for host verification, but rather trying to describe some possible default behaviors for those (existing) TLS mode options, which are defined in the TLS config.
Enforcement of SNI host and HTTP Host
match wouldn't be possible in a Passthrough
scenario, which is why I was suggesting you'd have to explicitly disable the enforcement to be able to enable client auth. Then, people know what they're getting themselves into.
As an example of an upstream routing based on Host
, imagine it wasn't even a router usptream, but rather a monolith web app that's multi-tenant. It could serve customer-a.mycompany.com
and customer-b.mycompany.com
.
This is totally contrived, but imagine they decided they wanted to have cient cert auth on Gateway
to prevent Customer A from even being able to load Customer B's app at all. They already have all the wiring to terminate TLS and manage certs and keys for their upstream app (for the customer domains), so they don't want to mess with that. They just verify the client cert, then passthrough TLS traffic to the app upstream. But, the app uses the HTTP Host
to determine which customer's app to display.
Again, contrived, but I could see scenarios where someone might want it. In the above scenario, they could set disableHostVerify: true
, and they could check the SNI host (passed in a header) to make sure it matches the HTTP Host
in the upstream app itself.
Despite this issue being quite old, we the maintainers are still pretty convinced that we want to have this functionality in a future release. We are marking this help wanted
as we're looking for contributors with strong use cases to help champion and drive this forward.
I think this is related to https://github.com/kubernetes-sigs/gateway-api/discussions/1244 and therefore GEP https://github.com/kubernetes-sigs/gateway-api/issues/1282 might be relevant / close the loop here.
It's likely this may block future Knative adoption of Gateway API until it's resolved -- we're moving to re-encrypt traffic from the ingress between our components to the user's pod.
It's related in that it's TLS, but this is for client certificate verification on the Listener, so on the connection from the client to the "outside" of the Gateway. #1244 and #1282 are both about the connection from the Gateway to the backend, "inside" the Gateway. (The descriptions are in quotes because they're not always the case, there's more to it than that, etc).
Hoping to help out with this issue.
Existing Projects
-
Here's an example of how mTLS is described to the user in Istio and how the user can configure their Gateway to support mTLS (by setting the
tls.mode
toMUTUAL
and specifying the CA Certificate ascacert
in the same secret where the server key and cert are defined) -
Here's another example from Contour that asks the user to define a
clientValidation
block where the secret containing the CA Certificate can be defined (the secret must be aOpaque
secret with the nameca.cert
)
What Common between them
- Explicitly define client validation along with TLS Termination (
MUTUAL
/clientValidation
) - Provide a way to plug in the CA Certificate
Current State in the Gateway API
- Has a mode field to define TLS session type
- has the ability to define Certificate as secrets of type
tls
with CertificateRefs
Follow up questions for the community
- Should mTLS reuse the
Terminate
mode or should there be a a new mode defined for mTLS - How does the user plug in the CA Cert
- In its current state, we cannot reuse
CertificateRefs
becausetls
Secrets dont support a CA Cert field
- In its current state, we cannot reuse
- Assumed that the API might not want to expose more knobs such allowing specific SAN Names (this is what Envoy exposes), please correct me if that is not the case
@shaneutt I'd like to work on this one
You got it @arkodg, thank you for volunteering! I would definitely recommend the "What and why, but not the how yet" approach to starting an initial GEP here so we can get aligned on what the problem is and what our goals are. Let us know if there's any help you need, and how we can support you in this effort! :vulcan_salute:
/assign @arkodg
NOTE: I'm moving this into the
v1.0.0
milestone as it appears we have someone who's going to move it forward. This doesn't mean we'll necessarily hold upv1.0.0
for it, but let's see where we can get with this in the coming weeks and hopefully it just falls right in.
@arkodg wanted to check in on this one?
thanks for the reminder @shaneutt, still plan on working on it
Sounds good, let us know if there's ways we can help facilitate :vulcan_salute:
/reopen
@arkodg: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/kind gep
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen
Since the GEP is still in motion.
@arkodg are there conformance tests planned?