gateway-api
gateway-api copied to clipboard
ResolvedRefs condition for GatewayClass parametersRef
A discussion was had in Slack about adding a ResolvedRefs condition to a GatewayClass status to reflect the state of a parametersRef object that is attached. This led to a couple different concerns.
-
How would this coexist with the current InvalidParameters reason that exists for the
Acceptedcondition on aGatewayClass? Potential flows and their status outcomes:- Create
GatewayClasswith badparametersReftarget:GatewayClassStatus[{ type: Accepted, status: False?, reason: InvalidParameters? }, { type: ResolvedRefs, status: False, reason: ResolvedRefs? }] - Create
GatewayClasswith validparametersReftarget, but invalid values:GatewayClassStatus[{ type: Accepted, status: False?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }] - Create
GatewayClasswith validparametersReftarget and values -> deleteparametersRefresource:GatewayClassStatus[{ type: Accepted, status: True?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]a.Gatewaycreated prior to this change expected to be okay unless dynamic changes are allowed, if so what happens? b.Gatewaycreated referencing thisGatewayClassafter this change should setGatewayStatus[{ type: Accepted, status: False?, reason: ? }]c. DoesGatewayClassmove toAccepted: False? - Create
GatewayClasswith validparametersReftarget and values -> changeparametersRefresource content so one field becomes invalid:GatewayClassStatus[{ type: Accepted, status: True?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]a.Gatewaycreated prior to this change expected to be okay unless dynamic changes are allowed, if so what happens? b.Gatewaycreated referencing thisGatewayClassafter this change should setGatewayStatus[{ type: Accepted, status: False?, reason: ? }]c. DoesGatewayClassmove toAccepted: False?
- Create
-
Another concern that was raised is what the expectations are if dynamic changes are allowed to a
parametersRef(by either adding/removing the reference, deleting the resource, or updating the fields within it). Specifically, if aGatewayClassgoes from beingAcceptedtonot Accepteddue to an invalid update to theparametersRef, how does that affect the entire config tree below it? By following a true declarative system, all Gateways under this class should no longer beAccepted, but this would nullify the entire configuration and be very disruptive. The other option is what was mentioned in the examples above, where previousGatewayslive on, while new ones are notAccepted. Issue here, however, is that if the control plane restarts, it loses the previous state of the validparametersRef, so the originalGatewaysthat were left alone will no longer receive those valid params, and the config is void. Note that this is technically an issue today with the existingInvalidParametersreason on theAcceptedcondition, so nothing new with the proposedResolvedRefscondition. -
Finally, what does conformance look like? While not stated in the spec, the existing
ResolvedRefscondition forHTTPRoutesis required in the conformance tests. Would that make this condition a requirement for aGatewayClass?
In relation to point 2, one proposal our implementation has come up with is to set Accepted: true, reason: InvalidParameters, message: GatewayClass is Accepted, but params are invalid for X reason and will be ignored. This means we ignore the params and allow the existing configuration to continue on, while giving a clear message to users as to what is happening.
The risk here is that if paramsRef goes from valid to invalid, any settings that were in there would be reverted to defaults, leading to a change in behavior or leading to downtime. The impact of this is obviously dependent on the implementation and the settings that exist in that paramsRef. Could be a better alternative than tearing down the entire config tree due to a non-Accepted GatewayClass, though.
I believe the current fuzziness in the spec around "snapshot/templating" behavior when creating Gateways referencing a configured GatewayClass combined with the difficulty handling of "dynamic" changes to parameters (by modifying either the reference or the referenced object) helps create the justification for adding a parametersRef field under infrastructure directly on Gateways, and trying to clean up this behavior to be better-defined.
Adding a parametersRef field as part of the infrastructure stanza was originally proposed and discussed extensively in https://github.com/kubernetes-sigs/gateway-api/pull/1868 and https://github.com/kubernetes-sigs/gateway-api/pull/1757 but was ultimately not included in the implementation in https://github.com/kubernetes-sigs/gateway-api/pull/2440 - I can't find the precise comment where it was blocked/dropped though (I think it happened after the GEPs merged, during API review).
/cc @howardjohn @robscott
Doesn't that just move the blast radius down one level? Unless we are considering the impact of an invalid policy attachment on a Gateway to be implementation-specific, versus stating in the spec that the Accepted condition should be false.
Doesn't that just move the blast radius down one level?
At a minimum, yes, and that alone feels like a good improvement.
I'd expect that it could be more reasonable for a Gateway to stay { type: Accepted, status: True } and just be missing some configured parameters (expressed through its existing { type: ResolvedRefs, status: False } or adding { type: PartiallyInvalid, status: True, reason: UnsupportedValue } similar to https://github.com/kubernetes-sigs/gateway-api/pull/2429 for HTTPRoute) than to figure out how to deal with a dynamic class/template becoming invalid in any more standardized way that doesn't have a huge blast radius.
Doesn't that just move the blast radius down one level?
That was also one of my original concerns. The proposal by @howardjohn in https://github.com/kubernetes-sigs/gateway-api/pull/2924 does limit the blast radius by making paramsRefs namespace-local, so the blast radius of a change would also be limited to the local namespace.
Unless we are considering the impact of an invalid policy attachment on a Gateway to be implementation-specific, versus stating in the spec that the Accepted condition should be false.
Maybe we should take these options and apply them more broadly?
https://github.com/kubernetes-sigs/gateway-api/blob/37a7ffb640bb077eb96843abdb647c2c8efea8d2/apis/v1/shared_types.go#L404-L418
@robscott That's similar to the approach that I mentioned in my first comment for the GatewayClass. Basically just keep things in a good state, but fall back on the paramsRef to the default values (our implementation may just have to use this approach since we are intending to add infra configuration at the top level and the GC paramsRef is the way to do that right now). It's almost the lesser of two evils. Your paramsRef config will be reverted to a valid state, which could cause disruption downstream since values are changing, but it's less disruptive than nullifying the entire downstream config.
This obviously is the big issue to figure out, but I also want to make sure we touch on the questions around the existence of a ResolvedRefs condition and the conformance for it.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
With #2924 merging, there's a path for configuring per-Gateway custom parameters now.
I'm not entirely sure if the existing InvalidParameters GatewayConditionReason for the Accepted GatewayConditionType is sufficient or if we should consider adding ResolvedRefs to Gateway conditions in addition to Listener conditions, but I think it's more clear now that we shouldn't be encouraging this pattern on the GatewayClass resource.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.