bors icon indicating copy to clipboard operation
bors copied to clipboard

Handle GitHub API timeouts and retries

Open Kobzol opened this issue 9 months ago • 7 comments

Currently, we send requests to the GitHub API without dealing with timeouts and retries in any way. We should make this more robust by adding some automatic retry/timeout mechanism:

  • Implement timeouts for the requests. Say, 10s at most, before the request is marked as timeouted.
  • Implement retry mechanism for GH API requests. If the request fails due to API rate limiting or a timeout, retry it again, ideally with some exponential backoff. See https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#checking-the-status-of-your-rate-limit and https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#exceeding-the-rate-limit.
  • I don't think that this will be needed at first, but in the future we might consider using some queue for requests (https://docs.github.com/en/rest/using-the-rest-api/best-practices-for-using-the-rest-api?apiVersion=2022-11-28) to avoid too many concurrent requests. But I would only implement this if we find some issues in production.

Kobzol avatar Mar 25 '25 11:03 Kobzol

Hi, I would like to work on this.

geetanshjuneja avatar Mar 30 '25 15:03 geetanshjuneja

Hi, for the timeout part, I just need to check if the api call takes more than 10 sec then it needs to cancelled which can be done using tokio::time::timeout.

geetanshjuneja avatar Apr 02 '25 08:04 geetanshjuneja

tokio::time::timeout is a bit low-level. We could use it if needed, but I suppose that reqwest or octocrab should have more fine-grained options for handling timeouts, e.g. read timeouts, connection timeouts, request timeouts, etc.

Kobzol avatar Apr 02 '25 08:04 Kobzol

octocrabbuilder::{set_read_timeout, set_write_timeout} would be fine for this.

geetanshjuneja avatar Apr 02 '25 08:04 geetanshjuneja

Yeah, something like that. Ideally, we should be able to find out from the error if it was actually a timeout though, so we could retry. I remember that octocrab had a bit opaque errors, so if it's not possible to figure it out from the error easily, then we might want to converge to the tokio timeout anyway.

Kobzol avatar Apr 02 '25 09:04 Kobzol

Hi I went through the octocrab's error enum and also hyper-timeout crate which octocrab uses for timeout. So I think we can figure out that the request timed out by checking service variant of the enum. Also check a test of hyper-timeout here.

geetanshjuneja avatar Apr 02 '25 19:04 geetanshjuneja

Uhh, downcast_ref is not exactly what I wanted to see in the codebase 😕 But I guess it's better than adding an additional timeout on top of an already existing timeout. So let's try it.

Kobzol avatar Apr 02 '25 20:04 Kobzol

I think that for now this is good enough with https://github.com/rust-lang/bors/pull/267.

Kobzol avatar Jul 19 '25 09:07 Kobzol