cadence-client icon indicating copy to clipboard operation
cadence-client copied to clipboard

Retry service-busy errors after a delay

Open Groxx opened this issue 3 years ago • 2 comments

Builds on #1167, but adds delay before retrying service-busy errors.

For now, since our server-side RPS quotas are calculated per second, this delays at least 1 second per service busy error. This is in contrast to the previous behavior, which would have retried up to about a dozen times in the same period, which is the cause of service-busy-based retry storms that cause lots more service-busy errors.


This also gives us an easy way to make use of "retry after" information in errors we return to the caller, though currently our errors do not contain that.

Eventually this should probably come from the server, which has a global view of how many requests this service has sent, and can provide a more precise delay to individual callers. E.g. currently our server-side ratelimiter works in 1-second slices... but that isn't something that's guaranteed to stay true. The server could also detect truly large floods of requests, and return jittered values larger than 1 second to more powerfully stop the storm, or to allow prioritizing some requests (like activity responses) over others simply by returning a lower delay.

Groxx avatar Jun 22 '22 02:06 Groxx

Pull Request Test Coverage Report for Build 0183905f-6dc6-40b7-861c-6866b8ff66be

  • 46 of 58 (79.31%) changed or added relevant lines in 4 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 64.183%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/common/backoff/retry.go 30 32 93.75%
internal/internal_task_pollers.go 12 22 54.55%
<!-- Total: 46 58
Files with Coverage Reduction New Missed Lines %
internal/common/backoff/retry.go 1 96.59%
<!-- Total: 1
Totals Coverage Status
Change from base Build 01838582-1814-4eb0-a654-48c8844fff22: 0.04%
Covered Lines: 12648
Relevant Lines: 19706

💛 - Coveralls

coveralls avatar Jun 22 '22 02:06 coveralls

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 11 '22 18:07 CLAassistant

Merging, will try to follow up this week with a cleanup (if feasible, given the custom behavior I remember... I suspect it won't be, but worth checking on anyway).

Groxx avatar Nov 07 '22 20:11 Groxx