result from "modkit pileup" command
Hello, thank you so much for such a helpful tool. I would like to ask whether we can know which CpG sites from the same molecule from the result of "modkit pileup" command.
Hello @Flower9618,
You'll have to use modkit extract calls (documentation) to determine the co-occurance of base modifications on a individual read level.
I see. Thank you so much. I will try this command.
In addition, for the ‘modkit repair’ command, both the input and output files must be in BAM format. Is there an easy way to handle data processing when different tools require different input formats? For example, after sequencing, the FASTQ file is used to trim adapters, which also outputs a FASTQ file. To repair the MM and ML tags, I need to convert the FASTQ file to a BAM file. Then, for mapping to the reference genome with Minimap2, I have to convert the BAM file back to a FASTQ file.
Hello @Flower9618,
The easiest way is to minimize the number of conversions. I recommend staying in (mod)BAM as much as possible. dorado will perform adapter trimming and mapping (find the docs here). The team on that project have made special effort to maintain the modified base tags so you shouldn't have to use modkit repair except in special cases.
Hello, @ArtRand ,
Thank you so much for your reply. I have molecules that have been modified base-calling by Dorado and saved in a BAM file (with MM,ML tag). Now, if I use the 'dorado trim' command to trim the adapter for these molecules and also choose the BAM output format, will the MM,ML tag be updated in the trimmed.bam file based on the trimming?
Hello @Flower9618,
Yes they should be. I'm going to add a "check" command to a future version of Modkit that will help to make sure the tags are correct. But running modkit summary with --log-filepath will do a quick check of the same. If some reads have incorrect tags - they will be logged out.