BALSAMIC icon indicating copy to clipboard operation
BALSAMIC copied to clipboard

[User Story] Update Sentieon

Open mathiasbio opened this issue 5 months ago • 6 comments

Need

As a hospital geneticist I want:

  • As many true variants as possible with as few false positive variants
  • As fast as possible
  • And as cheaply as possible

The new version of Sentieon: 202308 contains changes that allow us to affect all of these aspects.

Our version in production: 202010.02

Some relevant changes since version 202010.02:

  • Modified TNscope® AF output format.
  • Solved issue in duplex umi consensus that results depend on internal data order.
  • Improved speed of bwa mem.
  • Solved issue in TNscope such that evidence from overlapping read pairs were not adequately accounted for.
  • Solved issue in duplex umi consensus that could output zero-length reads.
  • Added support in Dedup algorithm to perform UMI barcode error correction.
  • Added support in LocusCollector and Dedup algorithms to perform consensus based deduplication as well as UMI barcode aware deduplication.
  • Improved machine learning model for TNscope.
  • Improved consensus of INDELs in Dedup algorithm.
  • Improved DNAscope pipeline speed and accuracy with a BWA model.

In summary:

  • Changes to variant callers could improve accuracy and sensitivity.
  • Changes to bwa mem could decrease the TAT
  • Inclusion of UMIs to Dedup could save ~20% of the reads wrongly discarded as duplicates, and improve quality of reads (https://github.com/Clinical-Genomics/BALSAMIC/issues/1361)

Suggested approach

In an update Sentieon PR temporarily hard-code the changes to the path to the newest version of Sentieon:

  • Ensure that all workflows are behaving as normal
  • Basically perform a mini-validation in the PR, checking to see that the variant-calling is working as normal and that all files can be produced

Considered alternatives

No response

Deviation

No response

System requirements assessed

  • [ ] Yes, I have reviewed the system requirements

Requirements affected by this story

No response

Risk assessment needed

  • [ ] Needed
  • [ ] Not needed

Risk assessment

No response

SOUPs

No response

Can be closed when

  • [ ] Sentieon has been updated to most recent version, or desired new version.

Blockers

No response

Anything else?

Replaces feature issue: https://github.com/Clinical-Genomics/BALSAMIC/issues/1250

mathiasbio avatar Jan 31 '24 13:01 mathiasbio