htsjdk icon indicating copy to clipboard operation
htsjdk copied to clipboard

Async BAM performance improvements

Open d-cameron opened this issue 5 years ago • 18 comments

Description

The existing aysnc read code performs read-ahead and decompression on a seperate thread. Combining with AsyncBufferedIterator and we can get ~3x performance increase when doing light processing (eg. picard CollectInsertSizeMetrics). This PR increases the level of parallelism to close to linear with the number of cores (tested to ~16x) through the following changes:

  • Separation of async BlockCompressedInputStream read-ahead and decompression into separate tasks to enable parallel decompression of multiple gzip blocks
  • Parallel async decoding of BAM records.
    • whilst this incurs the overhead of an additional buffer copy, it enables SAMRecord decoding to be performed in parallel
    • By default, SAMRecords are decoded in 8KiB batches to prevent excessive thread synchronisation overhead when processing small SAM records.

The default 128k buffer size results in read-ahead of only 2 gzip block thus only 2 concurrent decompression tasks. Specifying a larger buffer size (e.g. -Dsamjdk.buffer_size=4194304) enables scaling to additional cores.

Checklist

  • [ X ] Code compiles correctly
  • [ X ] New tests covering changes and new functionality
  • [ X ] All tests passing (tested on windows so some unrelated test cases fail)
  • [ NA ] Extended the README / documentation, if necessary
  • [ ] Is not backward compatible (breaks binary or source compatibility)

d-cameron avatar Jan 02 '19 12:01 d-cameron

Codecov Report

Merging #1249 into master will increase coverage by 0.260%. The diff coverage is 82.216%.

@@               Coverage Diff               @@
##              master     #1249       +/-   ##
===============================================
+ Coverage     69.203%   69.462%   +0.260%     
+ Complexity      8701      8177      -524     
===============================================
  Files            588       549       -39     
  Lines          34581     32930     -1651     
  Branches        5779      5567      -212     
===============================================
- Hits           23931     22874     -1057     
+ Misses          8368      7817      -551     
+ Partials        2282      2239       -43     
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/BAMRecordCodec.java 77.419% <20.000%> (-4.032%) 23.000 <0.000> (-3.000)
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java 79.144% <79.144%> (ø) 23.000 <23.000> (?)
src/main/java/htsjdk/samtools/SAMUtils.java 59.500% <80.000%> (-3.302%) 126.000 <1.000> (-5.000)
...sjdk/samtools/util/BlockCompressedInputStream.java 84.962% <83.582%> (-0.597%) 87.000 <17.000> (-3.000)
...samtools/util/AsyncBlockCompressedInputStream.java 78.571% <85.000%> (+6.571%) 8.000 <8.000> (-4.000) :arrow_up:
src/main/java/htsjdk/samtools/BAMFileReader.java 67.070% <90.476%> (-1.396%) 43.000 <2.000> (-9.000)
...coding/core/experimental/ExperimentalEncoding.java 0.000% <0.000%> (-100.000%) 0.000% <0.000%> (-1.000%)
src/main/java/htsjdk/variant/vcf/VCFConstants.java 0.000% <0.000%> (-87.500%) 0.000% <0.000%> (-1.000%)
src/main/java/htsjdk/tribble/FeatureCodec.java 50.000% <0.000%> (-50.000%) 1.000% <0.000%> (-1.000%)
...ain/java/htsjdk/samtools/cram/structure/Slice.java 44.545% <0.000%> (-36.946%) 20.000% <0.000%> (-75.000%)
... and 351 more

codecov-io avatar Jan 07 '19 00:01 codecov-io

@d-cameron Thank you for this PR. It sounds like a big speedup for bam reading.

It's fairly complicated multithreaded code in a mission critical section of the codebase so it might take some time to review.

lbergelson avatar Jan 08 '19 21:01 lbergelson

Hey @d-cameron.

This seems like a really useful feature for htsjdk. A few of us have read through and were wondering why you didn't use CompletableFutures as the underlying synchronization pattern.

Using this pattern would simplify the review process, further concerns about possible locks and eliminate the need for explicit sleeps in the code. In addition, Expections will be propagated across threads helping in error messaging and debugging.

Would you be amenable to modifying your code to use CompletableFutures? We understand the uncertainty around the process of getting a PR merged, and are committed to working with you towards that goal.

yfarjoun avatar Feb 11 '19 19:02 yfarjoun

Switching to CompleteableFutures will remove some of the sleeps, but not all. Note that even with CompleteableFutures, the code still needs a review for deadlocks and race conditions, although the complexity would mostly be limited to just the async read task.

The issue with the async read task is that the next read task can be scheduled either by the foreground thread, or the background thread and at most one of these tasks can be running at any point. We can't just have a single long-running task as that can deadlock when the thread pool is limited and results in unfair read-ahead scheduling when there are multiple files being read in parallel. Since the task can be scheduled

I've thought of an alternate design in which the read-ahead task are all scheduled from the calling thread so I'll rewrite using that approach with CompleteableFutures and see how much cleaner it ends up being.

d-cameron avatar Feb 12 '19 02:02 d-cameron

much appreciated.

yfarjoun avatar Feb 12 '19 14:02 yfarjoun

@yfarjoun @lbergelson refactored so that all background processing tasks are CompletableFutures scheduled from the foreground thread. This version should be much simpler to code review. I've split the refactoring to two separate commits - one with the CompletableFuture refactor, and a second adding the RecordOrException complexity required for exceptions to be raised at the same time as they are raised when reading synchronously

d-cameron avatar Mar 04 '19 11:03 d-cameron

@d-cameron Thank you! That's great. We will review this this week.

lbergelson avatar Mar 04 '19 16:03 lbergelson

Hi @d-cameron It looks like this is a couple of months behind master - can you rebase on current master before we start reviewing ? Thanks.

cmnbroad avatar Mar 04 '19 19:03 cmnbroad

@d-cameron I'm sorry I haven't gotten through reviewing this yet. I expected to get an htsjdk release out this week and then review it after that, but I've run into a number of blocking issues that ate my time and I haven't gotten to give this a proper look yet. It is one of my priorities to get to next week though. It will not be abandoned forever this time...

lbergelson avatar Mar 08 '19 22:03 lbergelson

Back to you @lbergelson

d-cameron avatar Mar 31 '19 14:03 d-cameron

Extended the README / documentation, if necessary

Is there a README / documentation with all the settings exposed through Defaults.java? This PR now exposes the threadpool size through a new one.

d-cameron avatar Mar 31 '19 14:03 d-cameron

@lbergelson @yfarjoun Any update on this PR?

d-cameron avatar Apr 26 '19 04:04 d-cameron

@lbergelson please review this when you have time.

yfarjoun avatar Jun 17 '19 18:06 yfarjoun

Oh no. The PR that I swore to review quickly and then never did. @d-cameron It looks like there are build failures due to a missing logger and a typo. Inflater/inflator strikes again.

I uploaded a fix here https://github.com/samtools/htsjdk/tree/lb_async_bam_performance_improvements

lbergelson avatar Feb 10 '20 22:02 lbergelson

Fixes in; updated to latest master

d-cameron avatar Mar 07 '20 06:03 d-cameron

I have taken the liberty of rebasing this onto current master, to resolve the (trivial) merge conflict and rerun the CI tests.

jmarshall avatar Aug 18 '21 09:08 jmarshall

Codecov Report

Merging #1249 (925cd2c) into master (6a60de7) will increase coverage by 0.028%. The diff coverage is 85.246%.

@@               Coverage Diff               @@
##              master     #1249       +/-   ##
===============================================
+ Coverage     69.814%   69.842%   +0.028%     
- Complexity      9627      9668       +41     
===============================================
  Files            702       704        +2     
  Lines          37607     37765      +158     
  Branches        6107      6119       +12     
===============================================
+ Hits           26255     26376      +121     
- Misses          8903      8916       +13     
- Partials        2449      2473       +24     
Impacted Files Coverage Δ
...ava/htsjdk/samtools/apps/TimeRandomAccessFile.java 0.000% <0.000%> (ø)
src/main/java/htsjdk/samtools/SAMUtils.java 62.651% <80.000%> (-0.151%) :arrow_down:
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java 81.944% <81.944%> (ø)
...samtools/util/AsyncBlockCompressedInputStream.java 79.661% <83.333%> (+0.994%) :arrow_up:
...sjdk/samtools/util/BlockCompressedInputStream.java 84.314% <84.286%> (-1.246%) :arrow_down:
src/main/java/htsjdk/samtools/BAMFileReader.java 70.992% <89.744%> (+1.491%) :arrow_up:
src/main/java/htsjdk/samtools/BAMRecordCodec.java 84.496% <100.000%> (+1.432%) :arrow_up:
src/main/java/htsjdk/samtools/Defaults.java 76.271% <100.000%> (+0.409%) :arrow_up:
src/main/java/htsjdk/samtools/SamStreams.java 59.649% <100.000%> (-4.966%) :arrow_down:
src/main/java/htsjdk/samtools/util/IOUtil.java 60.284% <100.000%> (+0.668%) :arrow_up:
... and 26 more

codecov-commenter avatar Aug 18 '21 12:08 codecov-commenter

I have been unable to reproduce the testDisableAsyncProcessingLetsExistingTasksComplete failure locally with OpenJDK 8 (java-1.8.0-openjdk on CentOS 8). Looking at the difference between builds 6577 and 6578 it appears that the issue is nondeterministic… (the worst kind!)

jmarshall avatar Aug 18 '21 12:08 jmarshall