sarek icon indicating copy to clipboard operation
sarek copied to clipboard

Customize Preprocessing based on each tool

Open berguner opened this issue 1 year ago • 4 comments

Description of feature

Hi, It seems like the CNVkit workflow uses cram_recalibrated files as input here: https://github.com/nf-core/sarek/blob/bcd7bf9cb98cddec27bb54fb47ee122c09388c02/subworkflows/nf-core/variantcalling/cnvkit/main.nf#L8-L12. As far as I remember, recalibrated files of WES or panel samples don't contain off-target reads because base recalibration is applied over the intervals only. It would be better using CRAM files containing all the reads (cram_markduplicates ?) for CNVkit analysis for utilizing off-target reads. This is especially important for custom panels where there are fewer target regions compared to WES.

berguner avatar Nov 11 '22 09:11 berguner

Hi! You can always achieve this by setting the parrameter --skip_tools baserecalibrator . I will add some docs on this.

FriederikeHanssen avatar Nov 11 '22 09:11 FriederikeHanssen

But wouldn't that make the pipeline skip recalibration for SNV/indel calling also? I usually run the pipeline with --tools "mutect2,vep,cnvkit".

berguner avatar Nov 11 '22 09:11 berguner

Yes, currently it is only possible to do one "type" of pre-processing.

I would transfer this to a bigger feature requests:

For scenarios such as above, it would be nice to allow different types of preprocessing. This would require tool based preprocessing steps, that ideally would still be customizable.

Such as:

md+ bqsr + haplotypecaller no md + bqsr + deepvariant md + no bqsr + cnvkit

(examples are completely made up)

This would llikely entail quite a massive change in how we manage data flow at the moment

FriederikeHanssen avatar Nov 11 '22 14:11 FriederikeHanssen

Other current options as a work around:

Utilize the --step functions to run the one tool that needs different preprocessing on the respective csv file that is available in results/csv to avoid duplicate mapping for example and save time & resources

FriederikeHanssen avatar Nov 11 '22 14:11 FriederikeHanssen