modkit icon indicating copy to clipboard operation
modkit copied to clipboard

modkit filter site

Open YouxinZhao opened this issue 6 months ago • 1 comments

I used modkit to obtain a pileup.bed file. I want to count the 6mA modifications in the genome. Should I manually filter the data? For example, setting coverage > 5 and fraction > 0.5, or does modkit have recommended parameters or functions for this?

YouxinZhao avatar Jun 16 '25 14:06 YouxinZhao

Hello @YouxinZhao,

It is difficult to recommend a specific way to count methylated positions since it depends on your biological system. But with a little more information it might be possible to get a handle on the quantity you're looking for.

If you're working with something like bacteria where there is a methyltransferase enzyme, you might expect that certain motifs will be methylated at a high rate. In this case a filter like you have is probably OK, you could even increase the fraction to 0.6 or greater. I would also recommend trying modkit motif search (docs here).

If you're working in a system where the modification rate is more stochastic, it becomes more difficult. One way to think about it is that each molecule of DNA has a certain probability of being methylated at a given position. With sufficient coverage (also considering the ploidy of the organism), the faction modified value should be a good estimate of this probability. But then asking the number of positions that are modified might not be the best question, instead maybe "how many positions are modified > 80% of the time?".

Happy to help more if you can provide some details on what you're looking at.

ArtRand avatar Jun 16 '25 19:06 ArtRand