Increase TLSRoute hostnames limit from 16 to 4096
What type of PR is this?
/kind feature /kind documentation
What this PR does / why we need it:
This PR updates the TLSRoute CRD validation rationale and proposes increasing the maxItem bound for hostnames from 16 to 4096. This change is proposed only to accommodate very large orgs. In large orgs, the current limit of 16, this can lead to hundred of thousands TLSRoute objects. These objects multiply storage, watch traffic, and controller memory/CPU, which drives up API-server latency and risks OOMs and instability.
In a similar PR the limit has been increased: https://github.com/kubernetes-sigs/gateway-api/pull/3205/ To safely deploy the change the author employed a XValidation rule. For my case, such a rule would likely be rejected for being too complex. One idea would be to add the validation to a Validating Webhook. However, the webhook as been removed:
The validating webhook has been removed. CEL validation is now built-in to CRDs and replaces the webhook. (#2595, @robscott)
Do you have any ideas on how to solve this issue? I would be happy for further assistance on how to tackle this.
Rationale:
- Hostname size on the wire: DNS RFCs limit a full domain name to 255 octets on the wire (see RFC 2181 §11). In practice this means up to roughly 253 printable characters in common textual form (no trailing dot); for SNI and Gateway API usage the ACE/Punycode representation should be used for IDNs and the same limits apply.
- RFC reference: https://datatracker.ietf.org/doc/html/rfc2181#section-11
- TLS/SNI reference: https://datatracker.ietf.org/doc/html/rfc6066#section-3
- etcd / API server request-size guidance: etcd dev-guide (v3.3) documents a practical per-request/message size ~1.5 MiB. kube-apiserver also commonly enforces a request-size cap). See: https://etcd.io/docs/v3.3/dev-guide/limit/#:~:text=etcd%20is%20designed%20to%20handle,any%20request%20is%201.5%20MiB
- Based on these assumptions, I did the following napkin calculation: I assume a base overhead ≈ 4 KB, then I get (1,572,864 − 4,096) / 256 = 1,568,768 / 256 = 6,128, so 4096 could be a conservative upper limit.
Which issue(s) this PR fixes:
No issue yet.
Does this PR introduce a user-facing change?:
The TLSRoute CRD validation has been adjusted to allow up to 4096 hostnames and rules per TLSRoute resource. Operators must validate kube-apiserver, etcd and Gateway controller behavior with representative manifests prior to enabling the new limit in production.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: alexanderstephan Once this PR has been reviewed and has the lgtm label, please assign aojea for approval. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Welcome @alexanderstephan!
It looks like this is your first PR to kubernetes-sigs/gateway-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/gateway-api has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @alexanderstephan. Thanks for your PR.
I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Thanks for doing the calculations @alexanderstephan, that's helpful.
However, adding this would mean that there's no further headroom for adding any other constructs to TLSRoute. In other conversations, we've talked about adding ALPN matching, and a couple of other additional complexities that escape me right now, but if we use all the space available for storing hostnames, we'll have none left for expansion.
On top of that, I'm not sure how this would work. Because TLSRoute is about sending all traffic that matches the hostname list to a single backend, are you anticipating having that backend using a certificate that has up to 4096 SANs? That seems like a very large amount, that I'd be surprised if it's supported in most certificate handlers (it would certainly massively increase the size of the certificate).
I'd like to understand the use case you're aiming for here better. What sort of use cases involve serving 4000 hostnames from one backend service, presumably with a single certificate?
Also, we did stop shipping a validating webhook, because CEL does everything we want, generally, and the complexity cost calculations are also generally a good indication that we're keeping the API complexity under control.
Thanks for looking into this, @youngnick!
However, adding this would mean that there's no further headroom for adding any other constructs to TLSRoute. In other conversations, we've talked about adding ALPN matching, and a couple of other additional complexities that escape me right now, but if we use all the space available for storing hostnames, we'll have none left for expansion.
I see. So, you're suggesting we should lower the limit more, e.g., to 2048?
That seems like a very large amount, that I'd be surprised if it's supported in most certificate handlers (it would certainly massively increase the size of the certificate).
I think it actually possible to have 10k+ hostnames per certificate from what I have seen. However, this does not apply here for this case since we can also have multiple certificates as explained below.
I'd like to understand the use case you're aiming for here better. What sort of use cases involve serving 4000 hostnames from one backend service, presumably with a single certificate?
So, the umbrella topic here would be "multi-tenant SaaS with custom domains". Here, a single backend shard can potentially serve thousands of tenant custom domains. Our deployments terminate TLS at the backend, selecting certs dynamically via SNI. So, TLSRoute’s role is to steer all those SNI matches to the right termination tier. In this context, it makes sense to consolidate many low-traffic domains behind one backend as it is more efficient.
Also, we did stop shipping a validating webhook, because CEL does everything we want, generally, and the complexity cost calculations are also generally a good indication that we're keeping the API complexity under control.
Okay, that makes sense. I guess this can be looked into in the next step when the motivation for this change is more clear.