mavis
mavis copied to clipboard
Add local assembly support for breakpoint validation in Nanopore inputs
Overview
From internal testing, wtdbg2 (long read SV assembler) performs well for assembly of breakpoints, something that MAVIS must implement.
The following changes must be made to integrate wtdbg2:
- [x] Ensure that local assemblies can validate known breakpoints
- [ ] @zhemingfan will add a test for collecting informative reads from a bam file (add a function to the gather.py file and accompanying unit tests
- [ ] @creisle will add a function to the bam.read module to process/simplify long read assembly alignments by removing indels below a certain size threshold as they are likely to be artifacts. This will ensure we can still use the CIGAR string of the alignment to call events downstream
- [ ] TODO: add option to config to support bam types (long read vs paired end short read etc)
For the initial long read assembly integration tests
Changes to MAVIS
- incorporate option for long read assembler
- choose "weird" reads in the evidence gathering step
- assemble with long read assembler
- re-align assemblies to the reference genome with long-read aligner
- continue usual downstream processing for now
To create our ground truth sequences
- pick several events found and validated by short reads
- create the breakpoint sequence using the reference genome for ~20 base pairs either side (or other)
For each test
- check that validate gather's all the reads you expect
- check that the assembly it builds contains the ground truth sequence
- align the ground truth to the assembly sequence using minimap2 and check that it looks alright