atlassian-python-api
atlassian-python-api copied to clipboard
Add exponential backoff and retry functionality to rest_client
While working with the Atlassian Cloud API and attempting to optimize the time it takes to do many operations, it's common to hit the Atlassian API request limits, resulting in annoying negative engineering.
This PR adds a convenience option to the underlying REST client, turned off by default, to retry using exponential backoff.
@gonchik I've not tested this quite enough for it not to be a [WIP]
, will update after I've done so.
Codecov Report
Merging #904 (c25ea52) into master (4d90a20) will increase coverage by
0.03%
. The diff coverage is47.82%
.
@@ Coverage Diff @@
## master #904 +/- ##
==========================================
+ Coverage 36.19% 36.22% +0.03%
==========================================
Files 32 32
Lines 6335 6357 +22
Branches 978 983 +5
==========================================
+ Hits 2293 2303 +10
- Misses 3934 3945 +11
- Partials 108 109 +1
Impacted Files | Coverage Δ | |
---|---|---|
atlassian/rest_client.py | 65.68% <47.82%> (-3.03%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4d90a20...c25ea52. Read the comment docs.
@jp-harvey checked a few options, that's very great step for the rest client module :) Thank you for your initiative, I hope soon it will be done.
Another change I made was to add these new options to BitbucketBase._new_session_args
. At least my assumption was that if you set backoff_and_retry=True when creating the Cloud
object you would expect all child objects to inherit the setting.
Actually functionality looks good to me. I have a Jenkins job running into the rate limiting, and this patch together with my modifications seems to solve the problem.
Great suggestion @flichtenheld , I've been working with Jira and not with Confluence so thanks for adding that in, glad it helped you out :-)
Another change I made was to add these new options to
BitbucketBase._new_session_args
. At least my assumption was that if you set backoff_and_retry=True when creating theCloud
object you would expect all child objects to inherit the setting.
@flichtenheld do you want to add those as a review to this PR as well?
Another change I made was to add these new options to
BitbucketBase._new_session_args
. At least my assumption was that if you set backoff_and_retry=True when creating theCloud
object you would expect all child objects to inherit the setting.@flichtenheld do you want to add those as a review to this PR as well?
Hmm, not sure what would be the best way to do that given that they are in a completely different file that is not touched by the PR. But you can see them here: https://github.com/flichtenheld/atlassian-python-api/commit/f94cef60fd313bfb6721e921c0f4609b71114046
Hi everyone, what are requirements for this pull request to be merged? Do you have any ETA?
@mykytoleg please check the review remarks.
@gonchik there has been no additional feedback by reviewers for some time following changes / responses and I think this one is probably good to go?
@jp-harvey @flichtenheld can we update PR and merge it ?
@jp-harvey @flichtenheld can we update PR and merge it ?
@gonchik I've made some of the changes, the others I am not comfortable making without doing testing and I don't have a good way to test at the moment. Functionally it works as it is and has been tested many times in real-life scenarios, it's just the code structure that is the concern now.
EDIT: looks like there's a linting issue as well which should be easy enough to sort out.
@jp-harvey @flichtenheld can we update PR and merge it ?
@gonchik I've made some of the changes, the others I am not comfortable making without doing testing and I don't have a good way to test at the moment. Functionally it works as it is and has been tested many times in real-life scenarios, it's just the code structure that is the concern now.
EDIT: looks like there's a linting issue as well which should be easy enough to sort out.
It works but it's wired and not easy to understand. If max back off is reached but the response was success a warning is logged.
backoff = 1
retries = 0
responseloop = True
while responseloop:
response = self._session.request(
method=method,
url=url,
headers=headers,
data=data,
json=json,
timeout=self.timeout,
verify=self.verify_ssl,
files=files,
proxies=self.proxies,
)
if self.backoff_and_retry:
for em in self.retry_error_matches:
if response.status_code == em[0] and response.reason == em[1]:
if retries > self.max_backoff_retries:
log.warning("Hit max backoff retry limit of {0}, no more retries.".format(self.max_backoff_retries))
responseloop = False
else:
log.warning('Backing off due to error "{0}: {1}" for {2}s'.format(em[0], em[1], backoff))
time.sleep(backoff)
backoff = backoff * 2 if backoff * 2 < self.max_backoff_seconds else self.max_backoff_seconds
retries += 1
break # for loop
else:
responseloop = False
The for loop and the if can also be combined using any()
together with comparing set
objects.
if self.backoff_and_retry:
if any([(response.status_code, response.reason) == em for em in self.retry_error_matches]):
if retries > self.max_backoff_retries:
log.warning("Hit max backoff retry limit of {0}, no more retries.".format(self.max_backoff_retries))
responseloop = False
else:
current_backoff = backoff + (random.random() * backoff / 10)
log.warning('Backing off due to error "{0}: {1}" for {2}s'.format(response.status_code, response.reason, current_backoff))
time.sleep(current_backoff)
backoff = min(backoff * 2, self.max_backoff_seconds)
retries += 1
else:
responseloop = False
Ok, now I see that there are two nested loops...
@jp-harvey can we finalize it ?
@jp-harvey can we finalize it ?
Hi @gonchik I'm at a point where it's no longer 100% clear to me what the issues are or what to change. I also don't have a good way to test at the moment, so making changes and testing is going to be high-friction.
I've also discovered subsequently that the requests
library has a built-in backoff and retry capability built into sessions, so that's probably a better option all round.
I think we have the following options:
- Abandon this PR
- Merge it (mostly?) as-is
- Someone other than me modifies the code, tests, and once approved it can be merged
- I modify the existing code, test, and once approved it gets merged
- I resubmit the PR using the retry capability of
sessions
Unfortunately I won't have the bandwidth to do it for a while and don't have an easy way to test it at the moment, so 4 and 5 won't be able to happen soon.
Sorry about this, I know the code works as it is now despite not being perfect, and I respect that it's considered not suitable for merging. By the time the PR was reviewed we passed a window I was able to keep working on it and don't want to make blind changes now that may break it. I may be able to in the future though, but would probably do option 5 before 4 because it looks like a more elegant option. I'm also fine with options 1-3 if that's the best thing for this project. Let me know what your preference is.
I will take a stab at resurrecting this mechanism since I'm currently working on related code anyway. While I have been using this code for years at this point and it works fine in production (the original code, not the current one in this PR which I'm pretty sure has been patched wrongly), I will try to change the mechanism to use urllib3.util.Retry as suggested to reduce unnecessary code.
Since #1339 was merged, I think this PR should be closed.
@flichtenheld thanks for taking this one up and your great work on https://github.com/atlassian-api/atlassian-python-api/pull/1339!