jira icon indicating copy to clipboard operation
jira copied to clipboard

Regression of 429 response handling in 3.6.0

Open matthias-bach-by opened this issue 1 year ago • 0 comments

Bug summary

While the improved handling of the retry-after header that got introduced in 3.6.0 via #1713 is an important improvement, it sadly introduced two regressions in the behaviour when handling 429 responses.

  1. While the retry-after value is finally evaluated and not just printed, it is sadly used as an upper bound for the backoff time. Thus, we are almost guaranteed to run into a second 429 response when handling the first one.
  2. The code assumes requests that have a retry-after value of 0 must not be retried. However, Jira seems to frequently provide a retry-after value of 0 on 429 responses.

With the previous plain backoff you had a good chance of backing off enough to happen to evade the rate limiting as long as you weren't running parallel requests. With the new behaviour we are a lot more aggressive and even with a single client easily run into the case of not even retrying.

Is there an existing issue for this?

  • [X] I have searched the existing issues

Jira Instance type

Jira Server or Data Center (Self-hosted)

Jira instance version

9.12.2

jira-python version

3.6.0

Python Interpreter version

3.12

Which operating systems have you used?

  • [X] Linux
  • [X] macOS
  • [ ] Windows

Reproduction steps

# 1. Given a Jira client instance
jira: JIRA
# 2. Running a random cheap operation often enough
for _ in range(100):
    jira.issue('SOME-1')

Stack trace

E           jira.exceptions.JIRAError: JiraError HTTP 429 url: https://jira.example.org/rest/api/2/search?jql=issuekey+in+%28SOME-1&startAt=0&validateQuery=True&fields=issuetype&fields=resolution&fields=status&maxResults=100
E           	text: Rate limit exceeded.
E
E           	response headers = {'Content-Type': 'application/json;charset=UTF-8', …}
E           	response text = {"message":"Rate limit exceeded."}

Expected behaviour

I'd expect the value of the retry-after header to be interpreted as a lower bound for the backup. I.e., doing something along the following:

delay = suggested_delay + self.max_retry_delay * random.random()

Furthermore, we should also retry requests that have a suggested_delay of 0. If that would indicate an error for 503 requests, we might need to make the decision depend on the return code, too.

Additional Context

No response

matthias-bach-by avatar Jan 26 '24 16:01 matthias-bach-by