sarek icon indicating copy to clipboard operation
sarek copied to clipboard

Create a sentieon specific subworkflow for DSL2

Open maxulysse opened this issue 2 years ago • 13 comments

EDIT (from @cjfields): Modules checklist:

  • [x] alignment (accelerated bwa)
  • [ ] ~BQSR (optional?) (EDIT: Won't be implemented. It should be possible to use GATK-based BQSR.)~
  • [x] DNAseq (SNV/Indel) - including GVCF support
  • [ ] DNAscope (SV and SNV/Indel) - including GVCF support
  • [ ] TNseq (tumor/normal) - including GVCF support
  • [ ] TNscope (tumor/normal)

maxulysse avatar Jul 15 '21 09:07 maxulysse

Checklist (some are already in place)

Modules

  • [x] alignment (accelerated bwa)
  • [ ] BQSR (optional?)
  • [x] DNAseq (SNV/Indel) - including GVCF support
  • [x] DNAscope (SV and SNV/Indel)
  • [ ] TNseq (tumor/normal) - including GVCF support
  • [ ] TNscope (tumor/normal)

cjfields avatar Apr 12 '22 18:04 cjfields

BQSR would be done also with a Sentieon subtool or GATK?

FriederikeHanssen avatar Apr 13 '22 07:04 FriederikeHanssen

Sentieon has that if I remember well

maxulysse avatar Apr 13 '22 12:04 maxulysse

Sentieon has that if I remember well

Correct; it also supports VQSR though I'm not sure if that's on the roadmap here, I've been running it separately.

cjfields avatar Apr 13 '22 14:04 cjfields

carefully toying around with it, is the correct answer I would say.

FriederikeHanssen avatar Apr 13 '22 14:04 FriederikeHanssen

we have a few modules ready from the danish national genome center, I'll just a add them and make the missing ones also with a sub workflow.

nicorap avatar May 11 '22 08:05 nicorap

Small note: in the 2.7.2 workflow we found a bug which skips Sentieon DNAscope and DNAseq when using manually recalibrated BAMs, which seems to be due to a difference in how GATK and Sentieon steps are implemented. GATK HaplotypeCaller, Manta, etc) appear to use a recalibrated BAM file with index here.

However, the next line of code indicates Sentieon steps are using the deduped BAM + index + recalibration table, and the related processes require that table. So, when given a recalibrated BAM + index, this effectively skips Sentieon calling altogether. The input channel for those steps also isn't generated when given a table like this.

The reason I point this out: we can add a simple fix for this and test it, but it's worth noting that all others steps apart from Sentieon use an already recalibrated BAM file and the Sentieon BQSR generates this BAM file anyway, so I'm not sure of the benefit of having this oddly unique step just for Sentieon. Maybe something to consider for the DSL2 implementation?

cjfields avatar Jul 07 '22 04:07 cjfields

That makes sense, and also explain why I never got to have sentieon running using 2.7.2 from bam. 😐. I think it’s important to be able to skip BQSR, as for production setups the data should always be the same, so it’s not needed.

nicorap avatar Jul 08 '22 16:07 nicorap

@asp8200 I feel we're getting close to finalize this issue

maxulysse avatar Feb 15 '24 09:02 maxulysse

TNseq and TNscope still not implemented :-/

asp8200 avatar Feb 15 '24 12:02 asp8200

yeah I know, but we're getting closer and closer to completion

maxulysse avatar Feb 15 '24 13:02 maxulysse

@maxulysse you are awesome for taking this one, we're about to get this running for some of our projects.

EDIT: just to add, we're not using TNseq/TNscope for this one (it's all germline) but we can certainly try it out.

cjfields avatar Feb 15 '24 15:02 cjfields

@asp8200 did all the work, he should get the praise. You should be able to test it, it's in the latest release already

maxulysse avatar Feb 15 '24 15:02 maxulysse