biokepi icon indicating copy to clipboard operation
biokepi copied to clipboard

Implement parallelization for BQSR

Open ihodes opened this issue 9 years ago • 6 comments

Cf. http://gatkforums.broadinstitute.org/wdl/discussion/1988/a-primer-on-parallelism-with-the-gatk

and http://gatkforums.broadinstitute.org/gatk/discussion/1919/parallelizing-base-quality-score-recalibration for specifics wrt BQSR

ihodes avatar Mar 21 '16 21:03 ihodes

This can be scatter-gathered according to the above links. Will be a huge win for us.

ihodes avatar Mar 21 '16 21:03 ihodes

Can also be done with Spark

On Monday, March 21, 2016, Isaac Hodes [email protected] wrote:

This can be scatter-gathered according to the above links. Will be a huge win for us.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/hammerlab/biokepi/issues/184#issuecomment-199501797

hammer avatar Mar 22 '16 01:03 hammer

@smondet Assigning you as you mentioned being mostly there on this at some pt. Unassign if you can't get to this, no worries.

ihodes avatar Jun 01 '16 21:06 ihodes

The link above is implying something different than implemented in #286. While they gather the individual statistics in a parallel manner, there is a reduce step to create a single covariate table.

Second the gathering will be more complicated than just concatenating text files. To combine the GATKReports, you need to fundamentally understand the GATKReport format. Reports have be combined statistically by adding the observations of each covariate and recalculating the Estimated Q value of the combined report.

arahuja avatar Jun 08 '16 17:06 arahuja

Nice catch; seems pretty important. @smondet is that something we could support in #286?

ihodes avatar Jun 08 '16 17:06 ihodes

@ihodes I don't think I understand enough the data to do that "statistical combination" Then if we have the tool that does it; it's doable yes (but harder than the current version).

smondet avatar Jun 08 '16 18:06 smondet