Better understanding to the webhook check response
Describe the feature
What problem are you trying to solve? Now for the webhook check response, status code 2xx is regarded as successful and all the other results are regarded failure. The failure may lead to stepping failedChecks counter and retry webhook check later.
Sometime, the external judge (the webhook server) knows that it is waiting for an action to be finished so Flagger should wait and retry later. In such case, instead of not responding to the webhook check request (thus timeout happens in Flagger side), the judge had better reply 408 (timeout) explicitly telling Flagger to retry later and not to step failedChecks counter.
Another case, if the external judge (the webhook server) find serious fault in the canary, it can reply 401 (or other code you think proper). With this status code, Flagger shall treat it as roll-back immediately rather than retrying the webhook or waiting for later check of Rollback webhook. Sooner the rollback less the failed user traffic.
Proposed solution
What do you want to happen? Add any considered drawbacks. Besides status code 2xx, Flagger shall understand at least status code 408 (timeout, retry later) and 401 (fail, rollback immediately). All other status code can be left as normal failed check. The implementation effort is minor. Webhook document need to be updated accordingly.
Any alternatives you've considered?
Is there another way to solve this problem that isn't as good a solution? None. For 401 case, Rollback webhook can be used but it is not always on the highest priority. Flagger may treat the upgrade failed until failedChecks reaches limit. And that could last unexpected longer time.
A response with a 408 status code implies that the request timed out and the server wants to close the connection. It has nothing to do with retries. From https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408:
The HyperText Transfer Protocol (HTTP) 408 Request Timeout response status code means that the server would like to shut down this unused connection. It is sent on an idle connection by some servers, even without any previous request by the client.
The 401 status code means that the request is unauthorized and has nothing to do with the canary payload. As you said, for rolling back the webhook server can simply return a 2xx status code for the rollback webhook.