gatk
gatk copied to clipboard
Make DepthOfCoverage multi-threaded
Feature request
Tool(s) or class(es) involved
Tool/class name(s), special parameters? DepthOfCoverage
Description
Are there plans to make DepthOfCoverage multi-threaded? If not, would it be possible to require such improvements?
It was a feature which we would have loved, but alas this isn't the case. We also relied on -ct
to get percent of bases depending on their coverage (ex. 20x) which has now been dropped in GATK 4+ versions.
We came across another tool called mosdepth. When compared to DepthOfCoverage -
- It uses multithreading (albeit only for deflation, so no performance gains when going beyond 4 threads).
- It gives coverage for exome within 5 minutes, and even faster when we don't need the per base coverage output.
- Per base coverage output can be skipped using
-x
the output of this matches closely to output from DepthOfCoverage. Do keep in mind, DepthOfCoverage also supports this skipping when using the parameter--omitDepthOutputAtEachBase
which saves massively on I/O and cuts processing time from 50 minutes per sample to 40 minutes per sample.
If you do decide to give it a try, we have some tips and suggestions -
- The tool generates multiple output files. If looking for total coverage, check the last line of file
output.mosdepth.summary.txt
- If looking for percent of bases covered at target read depth, this information is present in file
output.mosdepth.region.dist.txt
. If your target read depth is 20x, you can search this file withgrep -P "total\t20\t"
and the third column should be the percentage (with only one decimal) - By using
-d4
switch, they claim the above percentage granularity increases to 4 decimal points.
@kvn95ss Thank you for your reply!
I was puzzled by one of your sentences stating that -ct
is not available in GATK4+. However, isn't the parameter --summary-coverage-threshold
in GATK4 supposed to be its equivalent?
Thanks!
@Z-Zen woah, it indeed is equivalent. I had come across a post where it was mentioned -ct
is not supported. While the reply did ask the OP to read the document, there was no indication that the parameter has been replaced.
I tried it with latest gatk (4.2.6.1) with single -ct 20
and --omit-depth-output-at-each-base
to speed up. It took 25 minutes, which is indeed quite faster than the older gatk3 version.