BALSAMIC icon indicating copy to clipboard operation
BALSAMIC copied to clipboard

feat: deduplicate with UMIs

Open mathiasbio opened this issue 5 months ago • 3 comments

This PR:

Blocked by:

  • Update Sentieon: https://github.com/Clinical-Genomics/BALSAMIC/issues/1384 This update requires updating Sentieon as the current version in production: sentieon-genomics-202010.02 Does not contain the consensus option for LocusCollector which is the step prior to Dedup:

Current prod: image

Latest Sentieon version: image

Background: Updating Sentieon to this version would allow us to use UMIs directly in the dedup step, which could rescue a significant number of reads wrongfully discarded as duplicates with a purely position-based approach.

It might also serve as the basis for the MRD-workflow which requires reaching very low VAFs, which might not be possible with the "3,1,1" approach in the UMI workflow, but still require some UMI error correction, which this solution offers.

For more info see user story: https://github.com/Clinical-Genomics/BALSAMIC/issues/1361

Issues to consider:

  • [ ] Percent duplicate metrics:

After implementing this, we are no longer able to retrieve % Duplicates and Optical Duplicates info from the ".metrics" file from dedup. The values are "0". How can this be fixed? I have notified Sentieon about this issue and they say that they will add more useful statistics to the report in the next version.

  • [x] Collapsed singleton or pairs?

Are the final collapsed reads maintained as Pairs or Singletons (as in the other Sentieon UMI collapse tool)? They are singletons, and Sentieon says this is intentional and beneficial.

Listing changes

Added: for new features. Changed: for changes in existing functionality. Deprecated: for soon-to-be removed features. Removed: for now removed features. Fixed: for any bug fixes. Security: in case of vulnerabilities.

Review and tests:

  • [ ] Tests pass
  • [ ] Code review
  • [ ] New code is executed and covered by tests, and test approve

mathiasbio avatar Jan 16 '24 16:01 mathiasbio