How to filter the 6mA `pileup.bed` file?
For 6mA site identification, is it necessary to filter the pileup.bed file to obtain reliable 6mA sites? If filtering is done based on Nvalid_cov, percent.modified, and Nmod, what would be reasonable threshold values for these parameters? Thanks!
Hello @yyzhou4535,
Very sorry about the delay getting back to you - a big Modkit pileup update is coming!
I need a little more information about your biological question to properly answer your question. If you're looking for constitutively methylated positions such as those which are the target of a methyltransferase enzyme. These are often found bacterial systems. I'd recommend trying the modkit motif search algorithm (docs here). Other kinds of DNA methylation may be more subtle than "always methylated" vs "never methylated". That being said, you will have a sampling bias with low valid coverage no matter the system you're looking at. I'd recommend requiring that valid_coverage (the 'score' column, column 4) is $\geq$ 10, and even then with all of the A bases in a genome you may still get sites that get 10 6mA calls simply by chance.
If you give me a little more information, maybe I can help you more.