pr-harmony icon indicating copy to clipboard operation
pr-harmony copied to clipboard

Plugin possible cause of RejectedExecutionException - unhandled events

Open carlmot opened this issue 8 years ago • 7 comments

Bitbucket Data Center v4.5.1 (2 nodes). Issue occurs during peak usage hours.

In logs a few entries for 'The event queue is full. Stacks for the processing threads follow' Then many for 'RejectedExecutionException' after the previous occurs.

Atlassian has documented this issue here: https://confluence.atlassian.com/bitbucketserverkb/rejectedexecutionexception-779171377.html It points to "The event queue is full. Stacks for the processing threads follow:" Looks like PR Harmony may be the cause judging by the log (snippet attached) and the code. *Attempting to confirm issue is due to this plugin (e.g. by disabling the plugin during error events) - unable to do so as of today.

Searched plugins on GitHub for these issues. Pull Request Notifier had one issue logged and had a fix (Processing on Bitbucket Server's event threads): https://github.com/tomasbjerre/pull-request-notifier-for-bitbucket/issues/78 Fix added (v 2.10/1.35) use of import java.util.concurrent.ExecutorService and handleEventAsync() instead of handleEvent().

Explanation and fix/proper way to handle events that can take longer than 'a second or two' here: https://bitbucket.org/izymesdev/workzone/issues/79/buildeventlistener-is-doing-its-processing

PRHarmony_Log.txt

carlmot avatar Sep 29 '16 23:09 carlmot

Interesting, I wasn't aware the Datacenter version worked that way. So the BuildEventListener can certainly be handled asynchronously but I don't see how the merge checks could. Hopefully that's never an issue, but if it is you can always increase the queue depth.

monitorjbl avatar Sep 30 '16 11:09 monitorjbl

@monitorjbl

I'm the principal developer for Bitbucket Server. I just wanted to note that there's nothing special about Data Center with regards to event processing; all versions of Bitbucket Server are the same. They all apply a limit to the number of events that can be queued before they start dropping them, to prevent OutOfMemoryErrors.

From looking over the source of your plugin, it looks like PullRequestListener.findPRByCommitId could be significantly improved to make it more performant. A couple of suggestions:

  • Use a larger page size than 10 when searching for open pull requests. 25 or 50 would produce a significant reduction in requests to the database without an undue impact on memory usage
  • Instead of using start += 10 to advance, use requests.getNextPageRequest()
    • If that returns null, it means there are no more pages
    • Based on that, I'd use a do/while loop instead, where the PageRequest is declared outside the loop instead of the Page<PullRequest>
    • You can also check requests.getIsLastPage() to know when you're at the last page
    • Regardless of exactly how you do it, these checks will prevent you from making a request beyond the last page to detect that there aren't any more results, saving another set of database queries
  • On the PullRequestSearchRequest.Builder, call withProperties(false) before build()ing
    • You're not using PullRequest.getProperties(), so loading them only makes your search slower
    • With this set to false, you could probably increase your page size even further; I might go as high as 100
  • Instead of using PullRequestService.getCommits, use PullRequestService.streamCommits and provide a CommitCallback
    • In your callback, you can check the commit ID against what you're looking for and, if you find it, you can return false; from onCommit. That allows the system to eagerly drop out of the git rev-list process once a match is found, rather than listing up to a million commits (and holding them all in memory to do so!) only to have the code find a match on the second one and discard all the others
    • Using a callback means you might need to use something like a MutableBoolean to allow you to set a flag that can be seen "outside" the callback to indicate that the pull request matched

You're welcome to make some changes and mention me, if you'd like a review.

Best regards, Bryan Turner Atlassian Bitbucket

bturner avatar Jan 03 '18 21:01 bturner

@bturner I've made the changes you suggested. Is there a way I can test this out on a local cluster of Bitbucket Data Center? My biggest problem with supporting that platform is the SDK doesn't seem to support it so I can't test the plugin on it.

monitorjbl avatar Jan 04 '18 02:01 monitorjbl

@monitorjbl,

In general you shouldn't need to do anything specific to support Data Center. The only time you need to tailor for it specifically is when the code needs to not execute in parallel when multiple nodes are present (scheduled jobs, for example, that need to only run on one node at most) or when you have cached data that needs to remain consistent across multiple nodes.

Our Building cluster-safe plugins document has a section (toward the bottom) about how to start a local cluster, but I'm not sure it's really clear enough. (Note that while it mentions AMPS 6.1.0, in order to test with Bitbucket Server 5+ you need to be using AMPS 6.3.0+, which includes all the cluster-relevant changes from AMPS 6.1.) Another alternative might be our full Data Center installation guide.

If it proves too difficult to run up a cluster using any of those resources, let me know and I'll see if I can find/write a better walkthrough.

Best regards, Bryan Turner Atlassian Bitbucket

bturner avatar Jan 05 '18 02:01 bturner

Hi, we are hit by this issue too. I can see that a fix is already committed but I'm wondering if there is a new add-on release available that has this fix? We are on version 2.4.0.

Thanks, Istvan

Pistahh avatar Feb 14 '18 12:02 Pistahh

v2.5.0 is available now

monitorjbl avatar Feb 16 '18 15:02 monitorjbl

We use v2.5.0 in production now since a couple of days now, it works correctly and we no longer see the "event queue if full" error anymore. Thanks. :)

Pistahh avatar Feb 23 '18 16:02 Pistahh