xRoutes do not report possible conflicts
This story/bug/feature enhancement request is about a bad UX I had while testing some scenarios of conflicts.
After creating a Gateway and attaching two equal similar routes to the Gateway, as a user I have no report that something may be wrong. This leads to a misunderstanding of why my routes are not working (or are working but giving me wrong answers), as there's no sign of my route not being the one really programmed on the proxy.
Let's take a look into the following manifest:
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gateway
spec:
gatewayClassName: someclass
listeners:
- name: default
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: echo
namespace: user1
spec:
parentRefs:
- name: gateway
hostnames: ["some.example.tld"]
rules:
- matches:
- path:
type: Exact
value: /
backendRefs:
- name: echo
port: 3000
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: echo
namespace: user2
spec:
parentRefs:
- name: gateway
hostnames: ["some.example.tld"]
rules:
- matches:
- path:
type: Exact
value: /
backendRefs:
- name: echo
port: 3000
Once applied, both users from namespaces user1 and user2 will be expecting that a curl to "http://some.example.tld" would return their backend, but just user1 will get the right answer, while user2 will have the feeling that everything is working fine, but the app is misbehaving and returning something different.
Why?
Because the status of both routes never let any of them know that one of the routes wasn't really programmed. Looking at the status of both, they have the same answer:
status:
parents:
- conditions:
- lastTransitionTime: "2025-12-02T18:13:39Z"
message: Route is accepted
observedGeneration: 2
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2025-12-02T18:13:39Z"
message: Resolved all the Object references for the Route
observedGeneration: 2
reason: ResolvedRefs
status: "True"
type: ResolvedRefs
Once the older route is deleted, the new one starts working.
So we need to start providing some more information for users when a route was properly programmed or not, and why it may not have been programmed (eg.: conflicted)
Tested implementations
The situation above was tested and confirmed with:
- Envoy Gateway 1.5.0
- Istio (1.29-alpha.d16be7b7a857b66a7a633f4b532c89b8428b485a)
- Cilium 1.18.2
This feels like we might have a gap in conformance testing for this specific case.
I would expect that HTTPRoutes should follow the documented conflict resolution guidelines, specifically
If everything else is equivalent (including creation timestamp), precedence should be given to the resource appearing first in alphabetical order (namespace/name)
...and the conflicting HTTPRoute should be set as Accepted: false. I'm not entirely sure what level of conflict granularity we're able to achieve consistently across dataplane implementations. While a conflict on exact path is pretty obvious, it gets much messier merging intersections like some header or query params only matches across different routes, especially considering it's likely desirable to have some degree of merging for things like two separate HTTPRoutes for different path prefixes attached to the same listener. There might also be some consideration of this logic being difficult to implement in the Gateway controller if the actual config merging (and conflict/failure) only happens during async dataplane programming (which I'm expecting may be the case with Envoy in the cited implementations).
The reason that Gateway listeners have a separate Conflicted status condition type is that (historically, prior to ListenerSet), they were all written within the same resource and it was preferable to avoid rejecting the entire Gateway if only a subset of listeners conflicted.
@rikatz I tested this with Airlock Microgateway 4.8, and I’m seeing the same behavior that you observed with the other implementations.
I think precedence is not the same as acceptance, because the routing precedence is indeed working correctly. It might be clearer to phrase it like this:
If everything else is equivalent (including creation timestamp), precedence should be given to the resource appearing first in alphabetical order (namespace/name). The other resources should have
Acceptedcondition set tofalsewith the reasonConflicted.
Route and their rules are often be only partially conflicted. However, if two routes are exactly identical and precedence is determined solely by creation time or alphabetical order, then I believe it’s worthwhile to set the status to false. In such a scenario, a request can never be routed to the lower-precedence route as long as the other one exists.
Yes, for partial validity the rules are supposed to be:
- If there's at least one valid thing (Rule in HTTPRoute, Listener in Gateway), then the object is partially valid, and can be accepted.
- If there are no valid things (the spec actually says "if the object would produce no config in the underlying dataplane", but this is a shorter way to write it), then the whole object must not be accepted.
The tricky part with the HTTPRoute test is that they do conflict, but only in terms of the parent ref. In some readings of Accetped, that means they could be Accepted (because they are semantically and syntactically valid), but Conflicted. However, we've never really resolved a bit of ambiguity around if Accepted means "locally valid" or "has attached to a Gateway" for HTTPRoute. In some places, we use it in one way, in some, the other. We do have Programmed to indicate that as well though.
Regardless, the case that Ricardo calls out should definitely produce a Conflicted state somewhere on the HTTPRoute that doesn't make it in, and we should have a conformance test to validate that.
I think our two options are:
- Conflicted HTTPRoute gets
Acceptedtrue, butProgrammedfalse, with aConflictedreason. This indicates the current state pretty accurately, but requires folks to checkProgrammedcorrectly. - Conflicted HTTPRoute gets
Acceptedfalse, andProgrammedfalse, both with a Conflicted reason. This makes Accepted mean more than just "locally correct and accepted for processing", which is a bit of an expansion on what it currently is.
After writing that out, I think I favor the former, but I could do either.
Edit: Whatever we do is probably going to really suck to implement, because now we will have to deep comparison into the HTTPRoutes to determine config winners (I suspect that's why none of us have done this yet).
Whatever we do is probably going to really suck to implement, because now we will have to deep comparison into the HTTPRoutes to determine config winners (I suspect that's why none of us have done this yet).
Yea, this is my concern with the desire to surface this in status - right now I expect controllers are just accepting syntactically valid HTTPRoutes and shipping this config off to the dataplane to let it sort out conflicts/precedence.
I can confirm I've seen different behavior per controllers, so I cannot guarantee that all of them are dropping.
That said, I know and imagine the pain that will be to do deep comparison. I am not on an implementation side, but wondering if at least adding a indexfield for each gateway/route/hostname/path would help (well, speaking it loud seems a very bad idea...)