roslin-variant icon indicating copy to clipboard operation
roslin-variant copied to clipboard

Quantify drawbacks of limiting Abra to coding regions

Open ckandoth opened this issue 5 years ago • 0 comments

As of Roslin 2.4.2, we run Abra per pair, but we used to run it per group. On a project where Abra operated on 5 BAMs at a time from the same patient (group), it took ~130hrs when limited to a coverage BED file created by GATK's FindCoveredIntervals with min mapping qual 1, min base qual 5, and min-depth 3: http://pi.mskcc.org/roslin/#/?id=Proj_06058_CFO:c53297ec-1723-11e8-8853-645106efb11c Data: /ifs/res/pi/Proj_06058_CFO.c53297ec-1723-11e8-8853-645106efb11c

But it took only ~45hrs limited to the list of targeted regions (Agilent Exons in this run): http://pi.mskcc.org/roslin/#/?id=Proj_06058_CFO:bb8a6294-1901-11e8-b36c-645106efb11c Data: /ifs/res/pi/Proj_06058_CFO.bb8a6294-1901-11e8-b36c-645106efb11c - unfortunately we cleaned this up, thinking it was as a duplicate.

Compare resulting mutation lists to quantify drawbacks of limiting Abra to just the targeted regions. If the results are not significantly different, then we will be able to significantly speed up this step that bottlenecks the roslin-variant workflow. In implementation, it would be better to limit Abra to the intersection of coverage and target BED files. E.g. when working with WGS, or when using a superset assay for an older smaller assay for which targets are unknown (http://plvpipetrack1.mskcc.org:8090/browse/RSL-829).

ckandoth avatar May 31 '19 15:05 ckandoth