gateway icon indicating copy to clipboard operation
gateway copied to clipboard

HTTPRoute Status Not Updated After Adding Backend

Open danehans opened this issue 3 years ago • 3 comments

I create a gatewayclass, gateway, and httproute. EG surfaces an error:

2022-09-23T16:51:34.608Z	ERROR	controller/controller.go:326	Reconciler error	{"runner": "provider", "controller": "httproute", "object": {"name":"httpbin","namespace":"default"}, "namespace": "default", "name": "httpbin", "reconcileID": "1d30e643-0896-4171-9323-7fc6aa75e865", "error": "failed to get service default/httpbin"}

I then create the backend service but the route's status is never updated:

...
  status:
    parents:
    - conditions:
      - lastTransitionTime: "2022-09-23T16:51:30Z"
        message: Route is accepted
        observedGeneration: 1
        reason: Accepted
        status: "True"
        type: Accepted
      - lastTransitionTime: "2022-09-23T16:51:30Z"
        message: Service default/httpbin not found
        observedGeneration: 1
        reason: BackendNotFound
        status: "False"
        type: ResolvedRefs
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: eg

cc: @arkodg @skriss

danehans avatar Sep 23 '22 16:09 danehans

I'm seeing this issue even when the backend is created before the httproute.

danehans avatar Sep 23 '22 17:09 danehans

I was able to work around the issue by deleting the httproute, gateway, and gatewayclass. After recreating the gatewayclass, gateway, and httproute (after creating the backend) the httproute surfaced the proper status conditions:

  status:
    parents:
    - conditions:
      - lastTransitionTime: "2022-09-23T17:10:28Z"
        message: Route is accepted
        observedGeneration: 1
        reason: Accepted
        status: "True"
        type: Accepted
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: eg

danehans avatar Sep 23 '22 17:09 danehans

sounds like an issue in the provider reconciler, maybe another reason to implement https://github.com/envoyproxy/gateway/issues/413 sooner than later

arkodg avatar Sep 23 '22 17:09 arkodg

tried this out and can repro the issue, it is tied to the logic behind how status is computed in the Gateway API translator. Currently the translator only merges/appends new status conditions https://github.com/envoyproxy/gateway/blob/765e14ea3bcc949f4a98775429d33dbb792042f1/internal/gatewayapi/contexts.go#L267, but does not reset older ones previously created (in this case https://github.com/envoyproxy/gateway/blob/765e14ea3bcc949f4a98775429d33dbb792042f1/internal/gatewayapi/translator.go#L581) @skriss can we reset the Status.Parents field (which might cause some status churn due to the change in LastTransitionTime) or do we need to parse through the status field and clear any previously set status condition ?

arkodg avatar Oct 03 '22 22:10 arkodg

The other option would be to actually set the ResolvedRefs condition to true (rather than omitting it) when all refs can be resolved. This would be inline with upcoming Gateway API 0.6.0 changes (https://github.com/youngnick/gateway-api/blob/3b01d971e9a60de30e30a59b7aa2d5bf12e8018e/site-src/geps/gep-1364.md#proposed-changes-summary), where we'll be required to do that. I think that might be the simplest change, but would have to look in some more detail to verify.

skriss avatar Oct 04 '22 16:10 skriss

@skriss what about the other status conditions (within the Gateway and HTTPRoute) set by the translator in the first iteration but have changed in the second iteration (due to change in resources) ?

arkodg avatar Oct 04 '22 17:10 arkodg

Yeah you may have similar issues for those. I'd say clearing the existing conditions before translate is probably a good first step, but yeah ultimately we probably need more intelligent condition merge logic to handle all the cases of adding, updating, removing conditions properly.

skriss avatar Oct 04 '22 17:10 skriss

Yeah you may have similar issues for those. I'd say clearing the existing conditions before translate is probably a good first step, but yeah ultimately we probably need more intelligent condition merge logic to handle all the cases of adding, updating, removing conditions properly.

yeah I think if we reset the previous conditions in the translator and merge the conditions (which will ensure an existing condition remains unchanged ) in the status subscriber we can compute the correct conditions w/o performing excessive writes against the API Server

arkodg avatar Oct 04 '22 18:10 arkodg

The other option would be to actually set the ResolvedRefs condition to true (rather than omitting it) when all refs can be resolved. This would be inline with upcoming Gateway API 0.6.0 changes (https://github.com/youngnick/gateway-api/blob/3b01d971e9a60de30e30a59b7aa2d5bf12e8018e/site-src/geps/gep-1364.md#proposed-changes-summary), where we'll be required to do that. I think that might be the simplest change, but would have to look in some more detail to verify.

#507 does just this. I'm concerned about making major changes at this point in the release, so I prefer #507 gets merged and we iterate to improve status. @arkodg I am unclear what the bigger issue is that you describe in https://github.com/envoyproxy/gateway/issues/415#issuecomment-1266127024 and what is meant by "clearing" or "reset" status conditions. Typically, status conditions are set and then updated based on the current state and not cleared or reset. It would be helpful if you could provide a reproducer for the bigger issue you describe.

danehans avatar Oct 06 '22 03:10 danehans

@arkodg thanks for taking the time to help me understand your comment. I now see that the gatewayapi translator needs to handle updating the other condition types as well. I think it's best to have the gatewayapi translator build the conditions list from scratch every time, send the conditions to the provider in a StatusUpdate and have the provider only update the object if the list differs (excluding lastTransitionTime).

danehans avatar Oct 06 '22 17:10 danehans