gateway-api icon indicating copy to clipboard operation
gateway-api copied to clipboard

HTTPRoute BackendRefs validation and potentially inconsistent view when backend is removed.

Open dawid-nowak opened this issue 1 year ago • 8 comments

I think a following scenario would need to be validated with respect to HTTPRoute and BackendRefs. The documentation is very vague here what should happen here.

  1. We add a HTTPRoute with valid BackendRefs, everything is good and the tests are passing.
  2. We remove the backend/object referenced by BackendRefs. The route is now invalid but it will not be updated since we would need to register for notifications about the removal of the referenced object.
  3. Which leaves the gateway in an inconsistent state.

Alternatively, we should drop validation of BackendRefs. So then the conditions don't need to be updated and the gateway doesn't care if the backend exists or it doesn't exist.

dawid-nowak avatar Nov 04 '24 18:11 dawid-nowak

It's expected that implementations watch the referenced backends for all HTTPRoutes and supported backend types, and rereconcile HTTPRoutes if there are changes, so the invalid route should be picked up when that rereconciliation occurs.

I guess this could probably be added to https://gateway-api.sigs.k8s.io/guides/implementers/ to clarify.

youngnick avatar Dec 02 '24 02:12 youngnick

Thanks, got it... If this expectation is as firm as it sounds, maybe it should be part of the standard/conformance process?

dawid-nowak avatar Dec 09 '24 15:12 dawid-nowak

Thinking about it further. If a service that BackendRef is referencing gets deleted i don't think we would get a notification on deletion unless we add a finalizer to it.

dawid-nowak avatar Dec 09 '24 16:12 dawid-nowak

Coming back to old notifications, it's expected that implementations would be watching all events for Service (assuming they support that as a backendRef, which is basically required for core conformance). That would include Delete events (and I know Cilium has code to handle that, for example).

When using controller-runtime, this is relatively straightforward - you just trigger re-reconciliation on any Service change.

I agree that testing this should probably be a part of the conformance process if it's not already.

youngnick avatar Feb 18 '25 01:02 youngnick

I think it is not just for the backend reference, I think similar principle applies to TLS/Secrets. The gateway would need to observe the secrets, download them locally, validate them and if the secrets change then it needs to re-reconcile the whole gateway.

dawid-nowak avatar Mar 13 '25 16:03 dawid-nowak

We've specifically avoided writing anything into the specs that requires validation of the contents of any referenced Secrets. But yes, the entire Gateway needs be rechecked if anything referenced in that Gateway or any object attached to its entire object tree changes.

So, any Gateway controller needs to watch, at a minimum, GatewayClass, Gateway, Secrets, ReferenceGrants, and any Route objects supported.

For all Routes, it also needs to watch: Service, ReferenceGrants again (for cross-namespace backend references).

If any Policy objects or other extension referenced objects are supported, then the implementation will need to retrigger Gateway calculation on update there as well.

I've had adding something about this to the Implementer's guide at https://gateway-api.sigs.k8s.io/guides/implementers/ on my list for some time, but I've never been able to get it to the top of the pile.

youngnick avatar Mar 17 '25 02:03 youngnick

"writing anything into the specs that requires" but i think the compliance tests are forcing that sort of behaviour, aren't they ? and forcing to check if the certificates are valid and not malformed.

Anyway... i am happy to keep the conversation going but i think we can resolve this issue.

dawid-nowak avatar Mar 19 '25 10:03 dawid-nowak

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 25 '25 11:06 k8s-triage-robot

We already have a test to check that HTTPRoutes with non-existent backendRefs get the proper ResolvedRefs condition. As said by @youngnick, all the implementations need to trigger reconciliation loops upon events on services. We could implement this specific test, but I think it's very redundant with https://github.com/kubernetes-sigs/gateway-api/blob/main/conformance/tests/httproute-invalid-nonexistent-backendref.go.

/close

mlavacca avatar Jul 03 '25 12:07 mlavacca

@mlavacca: Closing this issue.

In response to this:

We already have a test to check that HTTPRoutes with non-existent backendRefs get the proper ResolvedRefs condition. As said by @youngnick, all the implementations need to trigger reconciliation loops upon events on services. We could implement this specific test, but I think it's very redundant with https://github.com/kubernetes-sigs/gateway-api/blob/main/conformance/tests/httproute-invalid-nonexistent-backendref.go.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 03 '25 12:07 k8s-ci-robot