[Feature][APIServer] add retry for http client
- Enable retry in
executeRequestinpkg/http/client.go - Add unit test to ensure the retry function works as expected
Why are these changes needed?
Currently, the apiserver http client does not support retry. This PR enables the retry feature in http client.
Related issue number
Part of #3344
Checks
- [ ] I've made sure the tests are passing.
- Testing Strategy
- [ ] Unit tests
- [ ] Manual tests
- [ ] This PR is not tested :(
Hi @dentiny , PTAL Thanks!
let me know if you want a discussion, or you feel the PR is ready to review :)
Hi @dentiny , I've modified to retry only on specific http status errors (list below) and added backoff.
- 408 - Request Timeout
- 429 - Too Many Requests
- 502 - Bad Gateway
- 503 - Service Unavailable
- 504 - Gateway Timeout
For the overall timeout, I think it is now set when specifying http.Client before sending to NewKuberayAPIServerClient, do we also need to take care of that in NewKuberayAPIServerClient?
https://github.com/ray-project/kuberay/blob/5f593bc20b8d7f58f2f7b92d38bcef4b568ec321/apiserver/test/e2e/types.go#L95
Thanks!
Hi @dentiny
I found that my original retry logic is incorrect. The max attempt times should be maxRetry + 1.
Originally, I use < in the following for loop, so the attempt time is maxRetry. I modified as follow:
https://github.com/ray-project/kuberay/blob/7e0d6ad77528843ae0921cc193fbd1781c18b219/apiserver/pkg/http/client.go#L664
Hi @dentiny
I found that my original retry logic is incorrect. The max attempt times should be maxRetry + 1. Originally, I use
<in the following for loop, so the attempt time ismaxRetry. I modified as follow:https://github.com/ray-project/kuberay/blob/7e0d6ad77528843ae0921cc193fbd1781c18b219/apiserver/pkg/http/client.go#L664
Thanks! Then can we add a unit test with max retry count 0, to make sure functor does execute?
@dentiny I modified the TestAPIServerClientRetry to let it validate the callCount is expectCallCount and add the test case for maxRetry = 0
Thanks!
Hi @machichima , I didn't see new tests added for "maxRetry = 0" case (no new commits either). Did I mis-understand?
Hi @machichima , I didn't see new tests added for "maxRetry = 0" case (no new commits either). Did I mis-understand?
I am so sorry... I forgot to push the commit. Just pushed
Thanks!
cc @kevin85421 for final review
The failing CI for autoscaler seems to be flaky test, while this PR only modify apiserver part
Hi @kevin85421 / @rueian, do you think this PR is good to go? I want to raise it because @kenmcheng 's working on a similar feature for v2: https://github.com/ray-project/kuberay/pull/3869/files, seems some of the util functions could be extracted and reused. Thanks!
@kevin85421 Thank you! Just opened a doc issue here: https://github.com/ray-project/kuberay/issues/3883