modkit filter site
I used modkit to obtain a pileup.bed file. I want to count the 6mA modifications in the genome. Should I manually filter the data? For example, setting coverage > 5 and fraction > 0.5, or does modkit have recommended parameters or functions for this?
Hello @YouxinZhao,
It is difficult to recommend a specific way to count methylated positions since it depends on your biological system. But with a little more information it might be possible to get a handle on the quantity you're looking for.
If you're working with something like bacteria where there is a methyltransferase enzyme, you might expect that certain motifs will be methylated at a high rate. In this case a filter like you have is probably OK, you could even increase the fraction to 0.6 or greater. I would also recommend trying modkit motif search (docs here).
If you're working in a system where the modification rate is more stochastic, it becomes more difficult. One way to think about it is that each molecule of DNA has a certain probability of being methylated at a given position. With sufficient coverage (also considering the ploidy of the organism), the faction modified value should be a good estimate of this probability. But then asking the number of positions that are modified might not be the best question, instead maybe "how many positions are modified > 80% of the time?".
Happy to help more if you can provide some details on what you're looking at.