megalodon
megalodon copied to clipboard
Instruction about mod_mapping format
Hi @marcus1487 ,
It seems that the bam file from mod_mapping output does not contain the actual sequence of the read, instead, the actual sequence of the read is replaced by the part of the reference genome that the read mapped into it?
Is it correct?
Thanks, Vahid
Yes. This is correct.
The mod_mappings
format is also referred to as "reference-anchored modified base calling". The alternative is the "basecall-anchored modified basecalling" and is available with the mod_basecalls
output type. Reference-anchored modified basecalling assumes that the reference mapping sequence is correct for a read and uses this sequence as input to the modified base calling algorithms instead of the basecalled sequence.
For recent data and models (10.4 pore+sup model / Q20+) this makes very little difference since the sequences are generally largely the same, but for fast models and older data providing the reference sequence could greatly improve the accuracy of the modified base calls.
The mod_basecalls
output matches the files currently produced by Bonito and Guppy.
Thanks @marcus1487 ,
So, is it possible to get a bam output from megalodon with the base modification tag and the actual basecalled sequence of the read? I mean sth like mod_mapping but having the actual sequence of the read and not the ref.
Yes. This is the mod_basecalls
format, but note that this is a un-mapped BAM file.
Thanks @marcus1487
Would it be possible to get a version of the mod_basecalls
BAM file that has been mapped?
Ultimately what we want is to have a single BAM containing haplotypes (to be added after megalodon runs), methylation calls and sequence alignments. This would also be a single file that could be loaded into IGV for viewing of these data types. Right now we're having to keep multiple copies of BAM files and methylation call information, which is cumbersome and quite inefficient in terms of space usage.
Right now minimap2 won't accept a BAM file as input, so transferring over the methylation tags to a mapped BAM would be somewhat cumbersome to do. It definitely seems like something that would be better supported directly in Megalodon.
This output is supported in Bonito and Guppy (Remora Guppy support is under active development and will be included in a future Guppy release). In terms of including this in megalodon, I'm a bit worried that this output would confuse the megalodon outputs further (having unmapped_mod_basecalls, mapped_mod_basecalls, and mod_mappings). I think the native support for mapping this format would be better suited within minimap2, but look forward to the minimap2 developers on this point. I've just opened a ticket for this output in minimap2 (https://github.com/lh3/minimap2/issues/870).