sarek
sarek copied to clipboard
The pipeline should fail early when --no_intervals is used with joint germline calling
Description of the bug
As title states, we used to have it but apparently not anymore, see here: https://nfcore.slack.com/archives/CGFUX04HZ/p1697021363120009
Command used and terminal output
No response
Relevant files
No response
System information
No response
@FriederikeHanssen : You mentioned that "GenomicsDB doesn’t work without intervals."
In connection to that, I feel like pointing out that we now have two subworkflows for joint-germline variant-calling : one with GATK/haplotypecaller and one with Sentieon/haplotyper. The Sentieon/haplotyper subworkflow for joint-germlilne variant-callling doesn't use GenomicsDB, and as far as I can tell it works fine with the option --no_intervals, although we do not have a CI-test for that at the moment.
We do have this test, which gives:
[0e/b77807] process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIE... [100%] 4 of 4 ✔
with the sentieon-cmd being this
sentieon driver -r genome.fasta -t 2 -i testT.converted.cram --interval chr22_2-15000.bed --algo Haplotyper -d dbsnp_146.hg38.vcf.gz --emit_mode gvcf testT.haplotyper.chr22_2-15000.g.vcf.gz
If I do something similar but with --no_intervals added, that is,
nextflow run main.nf -profile test_cache,software_license,docker --sentieon_extension --input ./tests/csv/3.0/mapped_joint_bam.csv --tools sentieon_haplotyper --step variant_calling --joint_germline --outdir results --sentieon_haplotyper_emit_mode gvcf --no_intervals --nucleotides_per_second 20 --wes true
then I get:
[24/648d40] process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIEON_HAPLOTYPER:SENTIEON_HAPLOTYPER (testN) [100%] 2 of 2 ✔
and the sentieon-cmd looks like this:
sentieon driver -r genome.fasta -t 2 -i testN.converted.cram --algo Haplotyper -d dbsnp_146.hg38.vcf.gz --emit_mode gvcf testN.haplotyper.g.vcf.gz
I guess we want to keep the possibility of running the joint-germline sentieon/haplotyper with --no_intervals, right?
Should I perhaps add a test for that?
yes of course, just for the one that uses the genomicsdb route we want to have an early fail
yes of course, just for the one that uses the genomicsdb route we want to have an early fail
Should I add a pytest for that as mentioned above?
repeat #1434
I commented this in the other issue, but im going to put it here b/c this has some discussion -- rather than failing, the GATK workflow could use combineGVCFs
https://gatk.broadinstitute.org/hc/en-us/articles/360037053272-CombineGVCFs