gateway
gateway copied to clipboard
Status Updater Fails to Update Status
I see the following error in the EG logs when creating a gateway:
2022-09-23T09:35:03.806-0700 ERROR status/status.go:81 unable to update status {"runner": "provider", "name": "eg", "namespace": "default", "error": "Operation cannot be fulfilled on gateways.gateway.networking.k8s.io \"eg\": the object has been modified; please apply your changes to the latest version and try again"}
Note that a gatewayclass was already created.
@arkodg do you think https://github.com/envoyproxy/gateway/pull/395 may be updating gateway status before making a copy of it?
I see the same issue with httproutes:
2022-09-23T17:01:21.232Z ERROR status/status.go:81 unable to update status {"runner": "provider", "name": "httpbin", "namespace": "default", "error": "Operation cannot be fulfilled on httproutes.gateway.networking.k8s.io \"httpbin\": the object has been modified; please apply your changes to the latest version and try again"}
I think the status updater library needs to be updated to
- perform a GET before a PUT
- also made synchronous (currently asynchronous, runs in a go routine)
@arkodg do you think #395 may be updating gateway status before making a copy of it?
@danehans it passes a copy already https://github.com/envoyproxy/gateway/blob/25acee55e46f3cd022157037697b08844d96a55d/internal/provider/kubernetes/gateway.go#L351
shouldn't this https://github.com/envoyproxy/gateway/blob/f094c5bc02c8c00b1ca668586210b2c202e96b18/internal/status/status.go#L62 lead to retries where it GETs the object and tries applying status over it again? If the updates for the same object are being sent asynchronously, I think it would retry and eventually update the correct status after a few retries?
I think status updating can be async, as long as:
- it gets passed an operation to do on the status (I think EG does this already)
- it GETs the object from the controller-runtime cache, then applies the status operation, then checks if things have changed, and only applies if it has.
- There's only one status updater running
That is, the status updater can handle the queue of updates asynchronously, but each update attempt needs to get the latest version first, then perform any checks about the status, then update if things have changed. Contour's Gateway processing is a good example of the pattern: https://github.com/projectcontour/contour/blob/e8125fa21d0b35faa39606f577c0767ca863cb91/internal/controller/gateway.go#L272
Note that it calls the setGatewayNotScheduled
function during execution of the MutatorFunc, which means that the update operation will always be done on version when the MutatorFunc runs, rather that the version that's used when the MutatorFunc was created.
@youngnick currently the producers (such as gateway-api translator) update the entire object including status and pass it over to the status subscriber, so we would have to update logic in all the producers to collect the list of updated status conditions and pass those over to the status subscriber so that the conditions are updated in the mutator func and not before. I think that is something that is needed long term (because runners such as the xds-server do not have any reference to the gateway object), but it was simpler to convert the status updater into a lib to make the status updates synchronous within the status subscriber go routine so went ahead with that instead for the 0.2.0 release https://github.com/envoyproxy/gateway/pull/429
please ignore the previous comment, @youngnick took your advice and merged the gateway conditions within the mutator func https://github.com/envoyproxy/gateway/pull/435 thanks !