tombo icon indicating copy to clipboard operation
tombo copied to clipboard

Tombo to work with supplementary alignments

Open yaeba opened this issue 5 years ago • 2 comments

Hi Marcus,

I am looking for ways to make Tombo resquiggle the supplementary alignments as well. This could be a possible enhancement as it makes the data richer in any of the downstream analyses, especially when the reads have a lot of insertions and deletions against the reference genome.

I am aware that to get supplementary alignments from mappy, one would have to change best_n to > 1 and add if read.is_primary to filter our secondary alignments. However, I also realised that changing the code to account for supplementary alignments is not trivial as multiple normalised signals are overwriting each another in the fast5 files. Do you have any thoughts or other workarounds for this?

Thanks

yaeba avatar Mar 08 '19 02:03 yaeba

this would also be useful for RNA samples as when there are multiple similar isoforms of the same gene in the transcriptome there will be a number of potential alignments (particularly when the read is not the full length of the mRNA).

mparker2 avatar Mar 10 '19 10:03 mparker2

The system in place to handle 2D reads could be used for this purpose to store multiple alignments in a single fast5 file, but the work to actually make this happen is quite significant unfortunately. The main hurdle is that alignment (via mappy) and fast5 IO is handled in a threaded interface in order to store the reference in a shared memory space, while the resquiggling is prepared in a separate process (in order to avoid the GIL). Each re-squiggle process is linked to a single mapping thread and they interface via a pipe. This pipe expects a single re-squiggling result to be returned in order to write this result back to the read file.

In order to work around this, the mapping function would likely have to be asked for the primary alignment in one call, then any secondary alignments in subsequent calls (re-mapping the read each time). This all feels a bit hacky, but is possible in theory. The right way to do this would likely take even more re-factoring. For this reason this feature is unlikely to be added soon, but I will leave the issue open for future consideration in larger milestones for modified base detection.

marcus1487 avatar Mar 11 '19 22:03 marcus1487