Regression of 429 response handling in 3.6.0

Open matthias-bach-by opened this issue 1 year ago • 0 comments

Bug summary

While the improved handling of the retry-after header that got introduced in 3.6.0 via #1713 is an important improvement, it sadly introduced two regressions in the behaviour when handling 429 responses.

While the retry-after value is finally evaluated and not just printed, it is sadly used as an upper bound for the backoff time. Thus, we are almost guaranteed to run into a second 429 response when handling the first one.
The code assumes requests that have a retry-after value of 0 must not be retried. However, Jira seems to frequently provide a retry-after value of 0 on 429 responses.

With the previous plain backoff you had a good chance of backing off enough to happen to evade the rate limiting as long as you weren't running parallel requests. With the new behaviour we are a lot more aggressive and even with a single client easily run into the case of not even retrying.

Is there an existing issue for this?

[X] I have searched the existing issues

Jira Instance type

Jira Server or Data Center (Self-hosted)

Jira instance version

9.12.2

jira-python version

3.6.0

Python Interpreter version

3.12

Which operating systems have you used?

[X] Linux
[X] macOS
[ ] Windows

Reproduction steps

# 1. Given a Jira client instance
jira: JIRA
# 2. Running a random cheap operation often enough
for _ in range(100):
    jira.issue('SOME-1')

Stack trace

E           jira.exceptions.JIRAError: JiraError HTTP 429 url: https://jira.example.org/rest/api/2/search?jql=issuekey+in+%28SOME-1&startAt=0&validateQuery=True&fields=issuetype&fields=resolution&fields=status&maxResults=100
E           	text: Rate limit exceeded.
E
E           	response headers = {'Content-Type': 'application/json;charset=UTF-8', …}
E           	response text = {"message":"Rate limit exceeded."}

Expected behaviour

I'd expect the value of the retry-after header to be interpreted as a lower bound for the backup. I.e., doing something along the following:

delay = suggested_delay + self.max_retry_delay * random.random()

Furthermore, we should also retry requests that have a suggested_delay of 0. If that would indicate an error for 503 requests, we might need to make the decision depend on the return code, too.

Additional Context

No response

Jan 26 '24 16:01 matthias-bach-by