rollouts-plugin-trafficrouter-gatewayapi
rollouts-plugin-trafficrouter-gatewayapi copied to clipboard
[PROPOSAL] Do not modify weight for canaryBackendRefs in managed route rules
Related to thread in CNCF slack here: https://cloud-native.slack.com/archives/C01U781DW2E/p1741202571985169
I don't believe that we should be modifying the weight of a canaryBackendRef when that ref is inside of a rule that we've previously created from setHeaderRoute.
In the current state, doing a setHeaderRoute step while weight is 0 (i.e. after you've setCanaryScale and expect to expose a route to the canary service for a specific header ONLY, while still directing 100% of traffic to the stable backend) will cause any requests with that header to 504.
This is because setWeight is reconciled consistently, and will match the newly created rule for the header route as viable for updating the weight to 0. When that weight is is not a positive integer, requests will be dropped because there is no alternative backendRef to send those requests to.
In the istio configuration, you can see that when reconciling the weight, it only does so if httpRoute.Mirror != nil otherwise it will make no changes.
- https://github.com/argoproj/argo-rollouts/blob/master/rollout/trafficrouting/istio/istio.go#L142-L155
Since setMirrorRoute is not implemented in the gateway-api-plugin, yet, there is no risk in skipping all canaryBackendRef's that are part of managed rules.
However - when setMirrorRoute does get implemented, we will need to make sure that managed MIRROR routes have their backendRefs updated.
In the current structure of the configmap for recording the managed routes, there is not any distinction between these types of managed routes.
One solution here would be to use a different HTTPConfigMapKey for mirror routes than we do header routes, and this will continue to function as expected.
i.e.
const (
HTTPConfigMapKey = "httpManagedRoutes"
)
becomes
const (
HTTPHeaderRouteConfigMapKey = "httpManagedHeaderRoutes"
)
const (
HTTPMirrorRouteConfigMapKey = "httpManagedMirrorRoutes"
)
and so on and so forth for GRPC and TCP.
Remaining in this proposal would be the changes to the GRPC and TCP route code for those functions, which I can add if this approach is sound.
It would be great if there was a full example at https://github.com/argoproj-labs/rollouts-plugin-trafficrouter-gatewayapi/tree/main/examples that people can test and compare behavior with/without this PR implemented.
It would be great if there was a full example at main/examples that people can test and compare behavior with/without this PR implemented. @kostis-codefresh
sure thing, I've pushed that to this branch
Hmm - I'm not sure if this is unique to this change, but I just noticed that it can be possible for the plugin to trip over itself during setHeaderRoute if it is also being asked to reconcile the weight at the same time.
This causes both rpc calls to attempt to modify the HTTPRoute and throw an error that
Operation cannot be fulfilled on httproutes.gateway.networking.k8s.io "argo-rollouts-http-route": the object has been modified; please apply your changes to the latest version and try again
This causes an orphaned rule in the HTTPRoute that does not get recorded as managed in the configmap, which allows that to have a weight set to 0.
This means that the 503's return for any requests with that header until someone manually removes that orphaned rule and modifies the configmap to set the index as one less than it had recorded... hmm...
I wonder if that might also be related to the timeout on the E2E tests that just ran again from my push to the examples directory. Considering they passed on the previous commit for the actual code changes, this seems like it could be a pretty consistent flake/race condition.
I wonder how we could ensure that when we're operating on setWeight or setHeaderRoute there is no chance that the plugin could step over itself modifying the same resource at the same time.
Maybe we don't commit changes to the HTTPRoute on setWeight if the weight is already at the desired value?