BALSAMIC icon indicating copy to clipboard operation
BALSAMIC copied to clipboard

[User Story] Deduplication with consensus collapse

Open mathiasbio opened this issue 5 months ago • 3 comments

Need

As a clinician I want to be able to detect variants to a low allele frequency, as cheaply as possible, and with as few false positives as possible.

Using Sentieon Dedup from the new version of Sentieon 202308 gives the extra flag of --consensus which allows deduplication based on UMI tags as well as read position, and creation of consensus reads based on the duplicates. This has 2 benefits:

  • It allows us to rescue reads that are falsely discarded as duplicates, coverage has been observed to improve by ~ 28% in some samples (see table below) for a comparison of mean coverage with simple dedup, and using UMIs. image

  • It allows creation of better quality reads when duplicates are found.

Suggested approach

We would need to update Sentieon first to have access to this feature:

  • Update Sentieon https://github.com/Clinical-Genomics/BALSAMIC/issues/1384
  • Introduce new consensus option in balsamic. (has begun, see linked PR)

Considered alternatives

No response

Deviation

No response

System requirements assessed

  • [ ] Yes, I have reviewed the system requirements

Requirements affected by this story

No response

Risk assessment needed

  • [ ] Needed
  • [ ] Not needed

Risk assessment

No response

SOUPs

No response

Can be closed when

No response

Blockers

  • Update Sentieon (https://github.com/Clinical-Genomics/BALSAMIC/issues/1250)

  • This update is also blocked by some bugs in the Dedup UMI consensus approach. (Sentieon has been notified and have confirmed the issue, and they will fix it in the next release)

Anything else?

No response

mathiasbio avatar Jan 19 '24 09:01 mathiasbio