mavis icon indicating copy to clipboard operation
mavis copied to clipboard

Add local assembly support for breakpoint validation in Nanopore inputs

Open zhemingfan opened this issue 3 years ago • 1 comments

Overview

From internal testing, wtdbg2 (long read SV assembler) performs well for assembly of breakpoints, something that MAVIS must implement.

The following changes must be made to integrate wtdbg2:

  • [x] Ensure that local assemblies can validate known breakpoints
  • [ ] @zhemingfan will add a test for collecting informative reads from a bam file (add a function to the gather.py file and accompanying unit tests
  • [ ] @creisle will add a function to the bam.read module to process/simplify long read assembly alignments by removing indels below a certain size threshold as they are likely to be artifacts. This will ensure we can still use the CIGAR string of the alignment to call events downstream
  • [ ] TODO: add option to config to support bam types (long read vs paired end short read etc)

zhemingfan avatar Feb 03 '22 19:02 zhemingfan

For the initial long read assembly integration tests

Changes to MAVIS

  • incorporate option for long read assembler
  • choose "weird" reads in the evidence gathering step
  • assemble with long read assembler
  • re-align assemblies to the reference genome with long-read aligner
  • continue usual downstream processing for now

To create our ground truth sequences

  • pick several events found and validated by short reads
  • create the breakpoint sequence using the reference genome for ~20 base pairs either side (or other)

For each test

  • check that validate gather's all the reads you expect
  • check that the assembly it builds contains the ground truth sequence
    • align the ground truth to the assembly sequence using minimap2 and check that it looks alright

creisle avatar Jul 19 '22 18:07 creisle