sarek icon indicating copy to clipboard operation
sarek copied to clipboard

The pipeline should fail early when --no_intervals is used with joint germline calling

Open FriederikeHanssen opened this issue 2 years ago • 4 comments

Description of the bug

As title states, we used to have it but apparently not anymore, see here: https://nfcore.slack.com/archives/CGFUX04HZ/p1697021363120009

Command used and terminal output

No response

Relevant files

No response

System information

No response

FriederikeHanssen avatar Oct 11 '23 11:10 FriederikeHanssen

@FriederikeHanssen : You mentioned that "GenomicsDB doesn’t work without intervals."

In connection to that, I feel like pointing out that we now have two subworkflows for joint-germline variant-calling : one with GATK/haplotypecaller and one with Sentieon/haplotyper. The Sentieon/haplotyper subworkflow for joint-germlilne variant-callling doesn't use GenomicsDB, and as far as I can tell it works fine with the option --no_intervals, although we do not have a CI-test for that at the moment.

We do have this test, which gives:

[0e/b77807] process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIE... [100%] 4 of 4 ✔

with the sentieon-cmd being this

sentieon driver  -r genome.fasta -t 2 -i testT.converted.cram --interval chr22_2-15000.bed  --algo Haplotyper -d dbsnp_146.hg38.vcf.gz  --emit_mode gvcf testT.haplotyper.chr22_2-15000.g.vcf.gz

If I do something similar but with --no_intervals added, that is,

nextflow run main.nf -profile test_cache,software_license,docker --sentieon_extension --input ./tests/csv/3.0/mapped_joint_bam.csv --tools sentieon_haplotyper --step variant_calling --joint_germline --outdir results --sentieon_haplotyper_emit_mode gvcf --no_intervals --nucleotides_per_second 20 --wes true

then I get:

[24/648d40] process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIEON_HAPLOTYPER:SENTIEON_HAPLOTYPER (testN)             [100%] 2 of 2 ✔

and the sentieon-cmd looks like this:

sentieon driver  -r genome.fasta -t 2 -i testN.converted.cram   --algo Haplotyper -d dbsnp_146.hg38.vcf.gz  --emit_mode gvcf testN.haplotyper.g.vcf.gz

I guess we want to keep the possibility of running the joint-germline sentieon/haplotyper with --no_intervals, right?

Should I perhaps add a test for that?

asp8200 avatar Oct 11 '23 12:10 asp8200

yes of course, just for the one that uses the genomicsdb route we want to have an early fail

FriederikeHanssen avatar Oct 11 '23 12:10 FriederikeHanssen

yes of course, just for the one that uses the genomicsdb route we want to have an early fail

Should I add a pytest for that as mentioned above?

asp8200 avatar Oct 11 '23 12:10 asp8200

repeat #1434

cmatKhan avatar May 22 '24 17:05 cmatKhan

I commented this in the other issue, but im going to put it here b/c this has some discussion -- rather than failing, the GATK workflow could use combineGVCFs

https://gatk.broadinstitute.org/hc/en-us/articles/360037053272-CombineGVCFs

cmatKhan avatar Aug 03 '24 12:08 cmatKhan