GitHub.jl icon indicating copy to clipboard operation
GitHub.jl copied to clipboard

POST operations need to handle intermittent network failure.

Open samoconnor opened this issue 6 years ago • 5 comments

The GitHub doc says that they have been careful to use HTTP methods with semantics appropriate to the API operations.

HTTP.jl retries most request automatically in the event of network failure. However, POST operations are not generally idempotent and cannot be automatically retried. GitHub.jl should implement system-state-aware retry loops around POST requests. i.e. when a POST operation throws an HTTP.IOError GitHub.jl should retry the operation in a way that avoids duplicate changes to system state.

In the case of the create_status operation that is the subject of https://github.com/JuliaWeb/HTTP.jl/issues/220, POST /repos/:owner/:repo/statuses/:sha, it seems that double-posting would create two statuses, https://developer.github.com/v3/repos/statuses/, so in the event of a failure, the GitHub.jl create_status function should do a GET to check if the status was created before retrying (or, if duplicate statuses are not considered harmful to overall system correctness, just set retry_non_idempotent=true).

https://github.com/JuliaWeb/GitHub.jl/blob/beb3659dc0d43b29354fcf909de1b453e248b37c/src/repositories/statuses.jl#L30-L33

See also https://github.com/JuliaWeb/HTTP.jl/issues/214.

This issue may apply in other places that use gh_post_json.

samoconnor avatar Mar 17 '18 23:03 samoconnor

Help with this would be very much appreciated. This is causing significant issues with Nanosoldier, which is critical community infrastructure.

ararslan avatar Mar 26 '18 20:03 ararslan

Do you know what postrequest that is failing? If it is an idempotent one then we could just run with retry_non_idempotent=true for that request?

KristofferC avatar Mar 26 '18 20:03 KristofferC

@KristofferC the request that is failing is create_status. As described above "it seems that double-posting would create two statuses", i.e. it is not idempotent.

I would like to help out, but I need to see more detailed logs of what is going on with Nanosoldier: https://github.com/JuliaWeb/HTTP.jl/issues/220#issuecomment-373145283 https://github.com/JuliaWeb/HTTP.jl/issues/220#issuecomment-373150139.

samoconnor avatar Mar 26 '18 21:03 samoconnor

See log analysis: https://github.com/JuliaWeb/HTTP.jl/issues/220#issuecomment-376710381

To deal with the immediate practical problem in the Nanosoldier's use case I recommend modifying GitHub.jl to:

  • set retry_non_idempotent=true, restoring the old behaviour of blindly retrying POST requests; or
  • set idle_timeout=30 or idle_timeout=20 to remove/reduce the incidence of attempting to reuse a connection that has in-fact become unusable; or
  • (as a last resort) set reuse_limit=0 to disable connection re-use completely.

(To be reliable, GitHub.jl will still need to implement state-aware retry of non-idepmontent operations, but the changes above should improve failure rates in the short term).

samoconnor avatar Mar 27 '18 23:03 samoconnor

I think this probably should stay open per https://github.com/JuliaWeb/GitHub.jl/issues/106#issuecomment-376711247: "still need to implement state-aware retry of non-idepmontent operations"

samoconnor avatar May 31 '18 03:05 samoconnor