htsjdk
htsjdk copied to clipboard
Async BAM performance improvements
Description
The existing aysnc read code performs read-ahead and decompression on a seperate thread. Combining with AsyncBufferedIterator and we can get ~3x performance increase when doing light processing (eg. picard CollectInsertSizeMetrics). This PR increases the level of parallelism to close to linear with the number of cores (tested to ~16x) through the following changes:
- Separation of async BlockCompressedInputStream read-ahead and decompression into separate tasks to enable parallel decompression of multiple gzip blocks
- Parallel async decoding of BAM records.
- whilst this incurs the overhead of an additional buffer copy, it enables SAMRecord decoding to be performed in parallel
- By default, SAMRecords are decoded in 8KiB batches to prevent excessive thread synchronisation overhead when processing small SAM records.
The default 128k buffer size results in read-ahead of only 2 gzip block thus only 2 concurrent decompression tasks. Specifying a larger buffer size (e.g. -Dsamjdk.buffer_size=4194304
) enables scaling to additional cores.
Checklist
- [ X ] Code compiles correctly
- [ X ] New tests covering changes and new functionality
- [ X ] All tests passing (tested on windows so some unrelated test cases fail)
- [ NA ] Extended the README / documentation, if necessary
- [ ] Is not backward compatible (breaks binary or source compatibility)
Codecov Report
Merging #1249 into master will increase coverage by
0.260%
. The diff coverage is82.216%
.
@@ Coverage Diff @@
## master #1249 +/- ##
===============================================
+ Coverage 69.203% 69.462% +0.260%
+ Complexity 8701 8177 -524
===============================================
Files 588 549 -39
Lines 34581 32930 -1651
Branches 5779 5567 -212
===============================================
- Hits 23931 22874 -1057
+ Misses 8368 7817 -551
+ Partials 2282 2239 -43
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
src/main/java/htsjdk/samtools/BAMRecordCodec.java | 77.419% <20.000%> (-4.032%) |
23.000 <0.000> (-3.000) |
|
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java | 79.144% <79.144%> (ø) |
23.000 <23.000> (?) |
|
src/main/java/htsjdk/samtools/SAMUtils.java | 59.500% <80.000%> (-3.302%) |
126.000 <1.000> (-5.000) |
|
...sjdk/samtools/util/BlockCompressedInputStream.java | 84.962% <83.582%> (-0.597%) |
87.000 <17.000> (-3.000) |
|
...samtools/util/AsyncBlockCompressedInputStream.java | 78.571% <85.000%> (+6.571%) |
8.000 <8.000> (-4.000) |
:arrow_up: |
src/main/java/htsjdk/samtools/BAMFileReader.java | 67.070% <90.476%> (-1.396%) |
43.000 <2.000> (-9.000) |
|
...coding/core/experimental/ExperimentalEncoding.java | 0.000% <0.000%> (-100.000%) |
0.000% <0.000%> (-1.000%) |
|
src/main/java/htsjdk/variant/vcf/VCFConstants.java | 0.000% <0.000%> (-87.500%) |
0.000% <0.000%> (-1.000%) |
|
src/main/java/htsjdk/tribble/FeatureCodec.java | 50.000% <0.000%> (-50.000%) |
1.000% <0.000%> (-1.000%) |
|
...ain/java/htsjdk/samtools/cram/structure/Slice.java | 44.545% <0.000%> (-36.946%) |
20.000% <0.000%> (-75.000%) |
|
... and 351 more |
@d-cameron Thank you for this PR. It sounds like a big speedup for bam reading.
It's fairly complicated multithreaded code in a mission critical section of the codebase so it might take some time to review.
Hey @d-cameron.
This seems like a really useful feature for htsjdk. A few of us have read through and were wondering why you didn't use CompletableFutures as the underlying synchronization pattern.
Using this pattern would simplify the review process, further concerns about possible locks and eliminate the need for explicit sleeps in the code. In addition, Expections will be propagated across threads helping in error messaging and debugging.
Would you be amenable to modifying your code to use CompletableFutures? We understand the uncertainty around the process of getting a PR merged, and are committed to working with you towards that goal.
Switching to CompleteableFutures will remove some of the sleeps, but not all. Note that even with CompleteableFutures, the code still needs a review for deadlocks and race conditions, although the complexity would mostly be limited to just the async read task.
The issue with the async read task is that the next read task can be scheduled either by the foreground thread, or the background thread and at most one of these tasks can be running at any point. We can't just have a single long-running task as that can deadlock when the thread pool is limited and results in unfair read-ahead scheduling when there are multiple files being read in parallel. Since the task can be scheduled
I've thought of an alternate design in which the read-ahead task are all scheduled from the calling thread so I'll rewrite using that approach with CompleteableFutures and see how much cleaner it ends up being.
much appreciated.
@yfarjoun @lbergelson refactored so that all background processing tasks are CompletableFutures scheduled from the foreground thread. This version should be much simpler to code review. I've split the refactoring to two separate commits - one with the CompletableFuture refactor, and a second adding the RecordOrException
complexity required for exceptions to be raised at the same time as they are raised when reading synchronously
@d-cameron Thank you! That's great. We will review this this week.
Hi @d-cameron It looks like this is a couple of months behind master - can you rebase on current master before we start reviewing ? Thanks.
@d-cameron I'm sorry I haven't gotten through reviewing this yet. I expected to get an htsjdk release out this week and then review it after that, but I've run into a number of blocking issues that ate my time and I haven't gotten to give this a proper look yet. It is one of my priorities to get to next week though. It will not be abandoned forever this time...
Back to you @lbergelson
Extended the README / documentation, if necessary
Is there a README / documentation with all the settings exposed through Defaults.java
? This PR now exposes the threadpool size through a new one.
@lbergelson @yfarjoun Any update on this PR?
@lbergelson please review this when you have time.
Oh no. The PR that I swore to review quickly and then never did. @d-cameron It looks like there are build failures due to a missing logger and a typo. Inflater/inflator strikes again.
I uploaded a fix here https://github.com/samtools/htsjdk/tree/lb_async_bam_performance_improvements
Fixes in; updated to latest master
I have taken the liberty of rebasing this onto current master, to resolve the (trivial) merge conflict and rerun the CI tests.
Codecov Report
Merging #1249 (925cd2c) into master (6a60de7) will increase coverage by
0.028%
. The diff coverage is85.246%
.
@@ Coverage Diff @@
## master #1249 +/- ##
===============================================
+ Coverage 69.814% 69.842% +0.028%
- Complexity 9627 9668 +41
===============================================
Files 702 704 +2
Lines 37607 37765 +158
Branches 6107 6119 +12
===============================================
+ Hits 26255 26376 +121
- Misses 8903 8916 +13
- Partials 2449 2473 +24
Impacted Files | Coverage Δ | |
---|---|---|
...ava/htsjdk/samtools/apps/TimeRandomAccessFile.java | 0.000% <0.000%> (ø) |
|
src/main/java/htsjdk/samtools/SAMUtils.java | 62.651% <80.000%> (-0.151%) |
:arrow_down: |
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java | 81.944% <81.944%> (ø) |
|
...samtools/util/AsyncBlockCompressedInputStream.java | 79.661% <83.333%> (+0.994%) |
:arrow_up: |
...sjdk/samtools/util/BlockCompressedInputStream.java | 84.314% <84.286%> (-1.246%) |
:arrow_down: |
src/main/java/htsjdk/samtools/BAMFileReader.java | 70.992% <89.744%> (+1.491%) |
:arrow_up: |
src/main/java/htsjdk/samtools/BAMRecordCodec.java | 84.496% <100.000%> (+1.432%) |
:arrow_up: |
src/main/java/htsjdk/samtools/Defaults.java | 76.271% <100.000%> (+0.409%) |
:arrow_up: |
src/main/java/htsjdk/samtools/SamStreams.java | 59.649% <100.000%> (-4.966%) |
:arrow_down: |
src/main/java/htsjdk/samtools/util/IOUtil.java | 60.284% <100.000%> (+0.668%) |
:arrow_up: |
... and 26 more |