gateway-api GEP: SupportedFeatures

GEP-2162 is currently implemented by envoyproxy and istio. We received feedback from the community that we might want to add structure to it, to support future extension and better UX.

Previous discussions happened offline, during community meetings and on https://github.com/kubernetes-sigs/gateway-api/discussions/3128#discussioncomment-9655008.

This purpose of the issue is to get consensus on the structure, and decide to action item and timelines for implementing those before we graduate to standard.

Currently supported features is a flat list of strings. Here are a few suggested options:

To support future extensibility by changing it from a list of strings to a list of maps. Something like:

supportedFeatures:
- name: HTTPRouteHostRewrite
   description: <description>
   <some other field>: <value>
- name: HTTPRouteMethodMatching
..

It can also be tiered, something like:

supportedFeatures:
- name: <> (flat exmaple)
  description: <string / URL>
- profile (or featureGroup): (tiered example)
   name: TLSRoute
   description: <>
   features:
   - name: <>
     description: <> (optional)

This allows implementations to flatten featureLists.

/cc @arkodg @dprotaso @shaneutt @robscott @youngnick @mlavacca @mikemorris I think I captured the main two proposals we came with - please review.

Jun 17 '24 15:06 LiorLieberman

I like the idea of option 2️⃣ for bucketing to make human-readability a bit easier.

Putting too much longform descriptive text or URLs into each feature feels awkward, and may have some performance/storage overhead or link rot if any website things change.

Would this be a list of strongly-typed structs (and we would need to define fields now), or just generic maps and we'll standardize keys/values as a future step?

Jun 17 '24 16:06 mikemorris

+1 for 2. or a variant of it

tiering
- improves readability
- reduces information overload
- can reduce the total amount of text if we assume that an empty features field implies that all the features in the profile or featureGroup are supported
an optional description / message field allows implementations to specify something implementation specific about the feature, to the user, for e.g.
- caveats related to the feature e.g. feature only works for listener port 80 and 443
- more information about how to use the feature

Jun 17 '24 17:06 arkodg

I lean towards number 2. Who were the personas of the people asking for the UX improvement from Istio/envoygateway, btw? Were these users of those projects using the feature or developers implementing the feature?

Jun 18 '24 12:06 shaneutt

+1 for number 2 from my side as well.

Jun 18 '24 12:06 mlavacca

I'm +1 from number 2.

And at least one of the pieces of feedback was from me, wearing my "maintainer worrying about sizes of arrays" persona. We shouldn't have too much of a complexity problem here, since there's not nested layers of config like in Listeners or Routes like in other places, but I was worried about what SupportedFeatures would look like in the (reasonably likely) future where we have a lot of Extended features (say, > 100).

Jun 19 '24 00:06 youngnick

I lean towards number 2. Who were the personas of the people asking for the UX improvement from Istio/envoygateway, btw? Were these users of those projects using the feature or developers implementing the feature?

We discussed this during the community meeting but for posterity, feedback came from implementors mostly.

I have two concerns:

Would this be a list of strongly-typed structs (and we would need to define fields now), or just generic maps and we'll standardize keys/values as a future step?

This is first one, and we also discussed this during the meeting but we still need to follow up. I don't think we should go with "just generic maps", this could lead to a potential mess and unnecessary creativity from implementations to add things to this map. Also unification would be harder (unless we enforce the map's key names with conformance). OTOH, if we go with strongly-typed struct, we will likely need to re-open the GEP and reach consensus on the structure to avoid breaking changes, which the list of generic maps solution aimed to prevent.

My second concern is about the tiering, if we do that, it will be harder for automations/clis/whatever-machinery that is reading the supportedFeatures to understand what features under this tier/group. Again, this could be solved with yet other new standards.

This is a classic chicken-and-egg problem: to gather user feedback, we need to release it to standard(this is also when more implementations would actually start to implement that), but to do that, we need feedback on the design as this feature aims to enhance the user experience. So the big question, how do we execute on this iteratively, leaving us room for changes based on feedback from users, without making it too hard to deprecate/change parts of the structure.

Jun 26 '24 07:06 LiorLieberman

My second concern is about the tiering, if we do that, it will be harder for automations/clis/whatever-machinery that is reading the supportedFeatures to understand what features under this tier/group

+1, we'll need to watch out for this because I think the primary use case if for automation to pick this up and warn if unsupported features are being used. Of course making that automation loop through status and create its own flat list/set is not a big problem, but with any approach we need to make sure it is not unnecessarily painful for our primary use case to work.

This is a classic chicken-and-egg problem: to gather user feedback, we need to release it to standard

+1, we're struggling to get feedback on this one in particular, and it feels like we won't get meaningful feedback until it's already in standard channel. That inherently requires some educated guesses on our part, so here's a shot at that. In general, I think we'd end up with three categories of usage:

1. Automated tooling Tooling like gwctl that would warn when an unsupported feature was being used. Maybe gwctl is used to apply all Gateway manifests and warns before applying config that won't be supported, or maybe it's used after the fact to check config in a cluster for problems. @gauravkghildiyal may have a clearer picture of what this might look like. Note that this could be made easier if every resource had something like SupportedFeatures in status, particularly HTTPRoute and GRPCRoute.

2. Simple feature support checks with grep To really drive adoption of this feature, we'll need to update all of our docs to show the Supported Feature(s) a given document relies on, and how users can check if their implementation supports that feature. I'd imagine those instructions would include both a gwctl and kubectl command, where the kubectl command looked something like kubectl get gatewayclass acme -oyaml | grep SupportHTTPRouteMirror.

3. GatewayClass discovery Some users may just browse the GatewayClasses in their clusters to understand what features are supported.

In my opinion, the first 2 use cases will likely account for at least 90% of usage, and wouldn't really benefit from any form of nesting. That makes this whole conversation about 3, something I'd expect would represent <10% of usage.

In general, that leads me towards the simplest possible approach for now that would still allow for future growth. I think that would look like what k8s API conventions refer to as a "list of named subojects."

In a naive implementation, that would simply be a list of structs with a name field. Other fields that could make sense include:

Features - provide a path for nesting related features to make the output more readable
Status - maybe three states (fully conformant, partial conformance, no conformance tests)
Notes - implementation can provide notes, particularly useful if their support for a feature is only partially conformant

For now, I don't really feel strongly about a need for sub features, but given the overwhelming support above, I can be fine with them. If we choose to have sub features, I'd strongly prefer that they are per-resource. I don't think nesting really allows us to shortcut having a long list of features though. If we want to support kubectl get gatewayclass acme -oyaml | grep SupportHTTPRouteMirror, we'll need every single feature name to be spelled out in full, however we structure this.

Jun 28 '24 21:06 robscott

Note that this could be made easier if every resource had something like SupportedFeatures in status, particularly HTTPRoute and GRPCRoute.

I hadn't considered this option previously - not sure if it would be a helpful way to localize this information where it's most relevant, or if distributing this to resources that may not exist yet would cause more pain for automated usage. The awkward bit would be any "features" that cut across multiple resources.

Jul 02 '24 21:07 mikemorris

Note that this could be made easier if every resource had something like SupportedFeatures in status, particularly HTTPRoute and GRPCRoute.

I hadn't considered this option previously - not sure if it would be a helpful way to localize this information where it's most relevant, or if distributing this to resources that may not exist yet would cause more pain for automated usage. The awkward bit would be any "features" that cut across multiple resources.

This feels wrong to me on first past, like it creates a chicken and egg problem: You create your HTTPRoute with all the features you want, it fails to provision, and then you go look at the status that was enumerated for it to reduce the features from it with patches until it provisions? :thinking:

Also probably a minor thing, but when someone has 10k+ HTTPRoutes it feels kind of wasteful to be storing all this redundant information in every resource, especially since it's really only acted upon for the resource creation step :thinking:

The status of an object is meant to reflect it's provisioned state, I have a hard time seeing supported features as status information for routes, whereas I can definitely see it as a reflection of state of a GatewayClass :thinking:

Jul 03 '24 12:07 shaneutt

You create your HTTPRoute with all the features you want, it fails to provision, and then you go look at the status that was enumerated for it to reduce the features from it with patches until it provisions?

FWIW this isn't a dissimilar approach to how users already interact with Accepted: False status, looking for reasons or Conflicted: True etc, maybe a more granular use case here is somehow communicating this in an UnsupportedFeatures status though?

Also probably a minor thing, but when someone has 10k+ HTTPRoutes it feels kind of wasteful to be storing all this redundant information in every resource

Definitely agreed.

Jul 03 '24 13:07 mikemorris

You create your HTTPRoute with all the features you want, it fails to provision, and then you go look at the status that was enumerated for it to reduce the features from it with patches until it provisions?

FWIW this isn't a dissimilar approach to how users already interact with Accepted: False status, looking for reasons or Conflicted: True etc, maybe a more granular use case here is somehow communicating this in an UnsupportedFeatures status though?

Maybe :thinking: I'll think on it some more, the above was definitely my initial impression.

Jul 03 '24 14:07 shaneutt

The status of an object is meant to reflect it's provisioned state, I have a hard time seeing supported features as status information for routes, whereas I can definitely see it as a reflection of state of a GatewayClass 🤔

I actually tend to agree with this. Automations can read from the GWC to determine if HTTPRoute or any other resource can or can not be provisioned. I know it is debatable whether cross-resource validations are something we want, but, regardless I think GWC is the place to store this information. This is the one-stop-shop we have been pushing to have.

Whats the value of storing those features in each resource? I dont think that overloading the status field with this information is something we want to do. Not only because of the features duplication concern, but also the status is not meant to store it.

Jul 08 '24 12:07 LiorLieberman

Whats the value of storing those features in each resource? I dont think that overloading the status field with this information is something we want to do. Not only because of the features duplication concern, but also the status is not meant to store it.

I don't think it's particularly valuable to store the entire set of supported features on each resource, but if a resource is rejected by the controller, aligning a specific negative-polarity UnsupportedFeatures status with standard feature names feels like it could be useful (if possible to implement though, asking an implementation to document everything it doesn't support might not be palatable to implementation authors).

Jul 08 '24 14:07 mikemorris

I definitely do not think we should write supported features to individual objects. TBH I am opposed to putting documentation on CRDs at all, so do not think SupportedFeatures should exist at all, though (fully understand that this is not a common opinion though, so I don't mind it on GC where it isn't too harmful).

+1 to a UnsupportedFeatures reason on status though.

Jul 08 '24 14:07 howardjohn

I agree that the right place for supportedFeatures is in GatewayClass only. It's really a property of the implementation, and there's a close correspondence between application and GatewayClass.

I agree that a list of named subobjects is probably the right design here - with the subobject having only one, required, field to begin with: name. This will mean that the initial implementation would look like option 1, although we can easily change this to a tiered approach (option 2) later - because it's safe to change a required field to optional, but not the other way around. (We would do this by adding a profile field to the SupportedFeature object, with a value of the profile object, and making both name and profile optional.

So, in summary, I'm +1 for list of objects, with phase 1 being only a required name field - this produces the same effect as the list of strings that we have today, with the additional cost of six characters (name: ) on each feature name, but gives us much more expandability later.

Jul 11 '24 03:07 youngnick

Unless there are objections, I am going to move forward and implement that then, so we have it ready for 1.2.

Jul 16 '24 15:07 LiorLieberman

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 14 '24 16:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Nov 13 '24 16:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Dec 13 '24 17:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Dec 13 '24 17:12 k8s-ci-robot

/reopen

Dec 13 '24 17:12 LiorLieberman

@LiorLieberman: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Dec 13 '24 17:12 k8s-ci-robot

@mlavacca Any updates on Kong support ?

Dec 21 '24 20:12 LiorLieberman

@mlavacca Any updates on Kong support ?

I have an unfinished (yet) open PR. It'll likely be merged after the holidays, I'll keep you posted though.

Dec 23 '24 08:12 mlavacca

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jan 22 '25 08:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jan 22 '25 08:01 k8s-ci-robot

Kong implemented this in the Kong Gateway Operator https://github.com/Kong/gateway-operator

Feb 04 '25 14:02 mlavacca

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 06 '25 17:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Mar 06 '25 17:03 k8s-ci-robot

/reopen

May 09 '25 17:05 LiorLieberman