bitbucket-branch-source-plugin icon indicating copy to clipboard operation
bitbucket-branch-source-plugin copied to clipboard

JENKINS-64418: add simple exponential backoff to the sleep for bitbucket api rate limits

Open b-dean opened this issue 4 years ago • 5 comments

When we have hit the rate limits for Bitbucket API it slows everything down to a crawl. Also other API requests (such as getting the Jenkinsfile for multibranch pipelines) are rate limited and slower.

Adding exponential backoff with a minimum of 100 ms and a max of 1 minute, should alleviate the constant retry loop hitting the limit every 5s.

See JENKINS-64418.

Your checklist for this pull request

  • [x] Make sure you are requesting to pull a topic/feature/bugfix branch (right side) and not your master branch!
  • [x] Ensure that the pull request title represents the desired changelog entry
  • [x] Please describe what you did
  • [x] Link to relevant issues in GitHub or in Jenkins JIRA
  • [x] Link to relevant pull requests, esp. upstream and downstream changes
  • [ ] Did you provide a test-case? That demonstrates feature works or fixes the issue.

b-dean avatar Dec 10 '20 16:12 b-dean

Looking at Atlassians documentation for "Adjusting your code for rate limiting" I would argue that, of the three alternative ways proposed there, the 3rd option makes the most sense here.

See: https://confluence.atlassian.com/adminjiraserver/adjusting-your-code-for-rate-limiting-987143384.html

The 3rd option states that one can adjust the requests sent based on the request limit http headers.

  • x-ratelimit-interval-seconds: The time interval in seconds. You get a batch of new tokens every time interval.
  • x-ratelimit-fillrate: The number of tokens you get every time interval.
  • retry-after: The number of seconds you need to wait for new tokens. Make sure that your rate assumes waiting longer than this value.

I argue that this would be the way to go, due to the two reasons mentioned in the documentation about exponential backoff

:no_entry: High impact on a Jira instance because of concurrency. We’re assuming most active users will send requests whenever they’re available. This window will be similar for all users, making spikes in Jira performance. The same applies to threads — most will either be busy at the same time or idle.

:no_entry: Unpredictable. If you need to make a few critical requests, you can’t be sure all of them will be successful.

artheus avatar Dec 22 '20 09:12 artheus

@artheus Agreed, that is a better way. If you take a look at https://github.com/jenkinsci/github-branch-source-plugin/blob/master/src/main/java/org/jenkinsci/plugins/github_branch_source/ApiRateLimitChecker.java it uses similar header info from GitHub to reason over rate limits and choose a strategy. However, that will take considerably more work. Would you be interested in submitting a PR?

Also, take a look at https://github.com/jenkinsci/bitbucket-branch-source-plugin/issues/187#issuecomment-741929228. There might be a way to mitigate the behavior in some cases by using caching.

bitwiseman avatar Dec 22 '20 17:12 bitwiseman

@bitwiseman I will start working on a PR for this. I do not really agree to the use of caching as a part of the plugin though. I would much rather suggest/recommend users to use a caching reverse-proxy between Jenkins and Bitbucket, this way users would be in much better control of the caching layer, and the plugin would work as expected. Creating a caching layer here would make this a much larger job, as we would have to make a large amount of separate decisions, e.g:

  • In which scope to cache (per job, run, bitbucket project or globally)
  • In RAM or on disk?
    • if on disk, where would be the best place to store caches?
  • How to invalidate caches?
  • What is a good timeout for caching data from which changes are expected?
  • If a synchronization job is run, and cached data is used, that would be a complete waste of resources (RAM, CPU...)

I think that the scope for this issue should be just to add a way to properly comply with the Bitbucket rate limits. We could create a separate issue, PR or project for a caching layer.

artheus avatar Dec 31 '20 10:12 artheus

@artheus

I see your point about a caching reverse-proxy.

In the best of worlds, people would implement an additional system that gave them more control. However, most Jenkins systems do not exist in the best of worlds. Most don't have what you are describing. As such, we have to look to see if there are any simple changes that can be made in this plugin to mitigate this issue, such as turning on local caching provided by the client http library.

OkHttp makes adding local, on-disk, size constrained caching is relatively straightforward. It looks like Apache CachingHttpClients provides something similar, either in memory or on disk.

Take a look at the github-branch-source implementation. There is a cache for each credential (since different users are likely to see different results) and each cache is size-limited on-disk.

The model here is that queries are always sent to the server (with an ETag from any previous result), the server then either returns new data or informs the client that their cache is still valid. Cache invalidation is done automatically and there is no timeout: if there's a result that hasn't changed in a day or a week, as long as it is still valid it can be used.

If a synchronization job is run, and cached data is used, that would be a complete waste of resources (RAM, CPU...)

As synchronization job? I'm not sure what you mean, but the most common scenario overall is that some mixture query results are valid and caches are used and some are not and new data is retrieved.

In anycase... I completely support the plugin properly complying with rate limits. I look forward to your PR. That would certainly help this issue as well.

In the meanwhile, I've requested a few changes to this PR and will happily merge this a first step once they are made.

Thanks for contributing!

bitwiseman avatar Jan 05 '21 21:01 bitwiseman

@artheus, that article you linked to from Atlassian is for JIRA, not Bitbucket. I logged all the headers that come back from the request that has the 429 response and there's nothing that says retry-after

Server: nginx
Vary: Authorization, cookie, user-context
Cache-Control: no-cache, no-store
Content-Type: text/plain
X-B3-TraceId: 1a4e3ec6e8f2fe21
X-Dc-Location: ash2
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Date: Thu, 04 Feb 2021 21:55:33 GMT
X-Served-By: app-3002
X-Static-Version: 6d8cc6ef88d5
X-Credential-Type: apikey
X-Render-Time: 0.0227448940277
X-Accepted-OAuth-Scopes: repository
Connection: Keep-Alive
X-Version: 6d8cc6ef88d5
X-Request-Count: 152
X-Frame-Options: SAMEORIGIN
Content-Length: 46

I thought about X-Request-Count but someone else asked Atlassian and they said it means nothing.

b-dean avatar Feb 04 '21 22:02 b-dean