gateway-api icon indicating copy to clipboard operation
gateway-api copied to clipboard

TLSRoute: Require hostnames

Open rostislavbobo opened this issue 6 months ago • 4 comments

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR makes the hostnames field in TLSRoute non-optional. Hostnames is the only available match for TLSRoutes and, per the TLSRoute spec, at least one hostname match is required. If SNI hostname matching is not needed, TCPRoute should be used instead.

Which issue(s) this PR fixes:

Fixes #3871

Does this PR introduce a user-facing change?:

- Adds v1alpha3 experimental TLSRoute with the hostnames field being required.

rostislavbobo avatar Jun 20 '25 15:06 rostislavbobo

Welcome @rostislavbobo!

It looks like this is your first PR to kubernetes-sigs/gateway-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Jun 20 '25 15:06 k8s-ci-robot

Hi @rostislavbobo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 20 '25 15:06 k8s-ci-robot

Thanks @rostislavbobo!

/ok-to-test

robscott avatar Jun 27 '25 16:06 robscott

I think this is a good idea, but unfortunately, I think we need to add the new version, and move the v1alpha2 source files to mostly be aliases to the new objects (except for this change).

That will have the generator include both v1alpha2 and v1alpha3 into the generated YAML files.

Alternatively, we will need to very clearly document that folks must ensure that the v1alpha2 CRD objects are not removed from the cluster before the actual TLSRoute objects are migrated to v1alpha3, or all the TLSRoute objects will disappear.

youngnick avatar Jun 30 '25 03:06 youngnick

This is because I know that people are using TLSRoute in production. I know they shouldn't be, but they definitely are, and we will screw them over very hard if we are not careful.

youngnick avatar Jun 30 '25 03:06 youngnick

I think this is a good idea, but unfortunately, I think we need to add the new version, and move the v1alpha2 source files to mostly be aliases to the new objects (except for this change). That will have the generator include both v1alpha2 and v1alpha3 into the generated YAML files.

+1

Alternatively, we will need to very clearly document that folks must ensure that the v1alpha2 CRD objects are not removed from the cluster before the actual TLSRoute objects are migrated to v1alpha3, or all the TLSRoute objects will disappear. This is because I know that people are using TLSRoute in production. I know they shouldn't be, but they definitely are, and we will screw them over very hard if we are not careful.

+1

I agree: as long as the costs are reasonably low, we should at least try to avoid breaking people on the API today.

Normally alpha APIs are fair game for complete disruption at any time, but TLSRoute, UDPRoute and TCPRoute have been in a unique "perma-alpha" for such a long time that I think if it's not a big lift, trying to make this easier for folks is fair.

/cc @rycieos

shaneutt avatar Jun 30 '25 16:06 shaneutt

I know that people are using TLSRoute in production. I know they shouldn't be, but they definitely are

😆 yeah that's me. In my defense, I started using TLSRoutes in non-prod environments 2.5 years ago, hoping that it would graduate to GA along with the rest of Gateway API, at which point I would move it into prod. Obviously that didn't happen, and I recognize that taking that risk is on me.

I appreciate the work done to make such migrations painless, but I definitely understand that this part of the API is still in alpha and that disruptions are expected. I can't ask for more than keeping me in the loop, which the team has done great with so far.

Rycieos avatar Jun 30 '25 17:06 Rycieos

I agree: as long as the costs are reasonably low, we should at least try to avoid breaking people on the API today.

Of course, @shaneutt , @youngnick , that's why we're not introducing this breaking change in v1alpha2 and are bumping the version to v1alpha3 instead.

unfortunately, I think we need to add the new version, and move the v1alpha2 source files to mostly be aliases to the new objects (except for this change). This is because I know that people are using TLSRoute in production.

I hear you, makes sense. Let me alias TLSRoutes v1alpha2.

rostislavbobo avatar Jun 30 '25 20:06 rostislavbobo

I started using TLSRoutes in non-prod environments 2.5 years ago, hoping that it would graduate to GA along with the rest of Gateway API

This year I'm revamping TLSRoutes and hope this v1alpha3 finally sets it up for GA 🙂 When we're ready, I'll start coordinating this within the community and will keep you in the loop.

@Rycieos , let me know if there's anything else we should include.

rostislavbobo avatar Jun 30 '25 20:06 rostislavbobo

@Rycieos , let me know if there's anything else we should include.

I have brought up #2111 a lot already, but I think it bears repeating. As @robscott pointed out in the original Slack thread for this PR (which I will quote here since Slack is dying soon),

If I'm understanding correctly, this doesn't require a change to TLSRoute API itself, but more expanding the interaction between a Gateway and TLSRoute to allow for the combination of TLS termination and TLSRoute? It is absolutely something we should keep in mind though. I think/hope this version rev to v1alpha3 would set us up for GA of TLSRoute, and I'd really like to have the interaction with Gateway well specified before that point.

So IMO, #2111 should come at the same time as v1alpha3, or at the very latest in v1beta1, assuming we don't skip the beta stage for TLSRoute. How that ties in to Gateway resource versions I don't think has been figured out yet.

Rycieos avatar Jun 30 '25 20:06 Rycieos

I think we need to add the new version, and move the v1alpha2 source files to mostly be aliases to the new objects (except for this change).

@youngnick , v1alpha3 is already based on v1alpha2 (see v1alpha3/backendtlspolicy_types.go, v1alpha3/zz_generated.deepcopy.go). That means we can't reference v1alpha2 from v1alpha3. And I don't think we should change that, as this is the right way to handle versioning and aliasing in the v1alpha3 case.

I've put the new TLSRoute into v1alpha3 and just referenced reused v1alpha2 types (such as TLSRouteStatus and TLSRouteRule).

If we want to add all remaining XRoutes to v1alpha3, we'd need to keep the v1alpha2 XRoutes as they are, and then create new v1alpha3 XRoutes that reference these original v1alpha2 definitions. I personally don't see much value in doing this. What do you think?

rostislavbobo avatar Jul 01 '25 17:07 rostislavbobo

@rostislavbobo Yeah, I think that's fine. The intent here was to end up with the generated YAML having both versions, to save users from a breaking change when they install the v1.4 YAMLs (it's possible to have a TLSRoute saved in v1alpha2 that will not be valid in v1alpha3). Also, we can't even make a conversion webhook for this case, because we can't guess what the hostname should be if it previously was not present.

So, we're going to have to break people who weren't already setting a hostname in the TLSRoute.

But, if we only include v1alpha3 as the storage version in the generated CRD YAML, then the existing v1alpha2 objects will just vanish from the cluster when the definition is updated (since they're no longer a valid version) - even if they were using a hostname already and would translate just fine into v1alpha3.

So I think that the options here are:

  • include v1alpha2 with breaking changes between it and v1alpha3. This will break people who are not using a hostname, in that v1alpha2 objects cannot be cast into v1alpha3 objects. tbh I'm not sure how the apiserver will handle this case.
  • don't include v1alpha2 in the YAMLs. This will mean that everyone who is currently using TLSRoute will need to back up their routes in v1alpha2 before installing v1.4, change them to v1alpha3, making sure that a hostname is being set, then put them back after installing v1.4.

We can test this to find out what would happen in the first case by:

  • setup a cluster that has Gateway API v1.3 installed, including v1alpha2 TLSRoute
  • create some TLSRoute objects, one that does have hostname, and one that does not.
  • install the YAML from this PR
  • See if you can get any of the existing TLSRoute objects. (I don't think you will see them any more after that update).

youngnick avatar Jul 01 '25 23:07 youngnick

/retitle TLSRoute: Require hostnames and bump version to v1alpha3

robscott avatar Jul 14 '25 17:07 robscott

Thanks @rostislavbobo!

/approve /cc @mlavacca @shaneutt @youngnick

robscott avatar Jul 14 '25 17:07 robscott

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: robscott, rostislavbobo, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [robscott,shaneutt]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 14 '25 17:07 k8s-ci-robot