Inconsistent m6A counts with --ignore; how to compare m5C differences?
modkit pileup Y1_5_1.bam Y1_5_1.bed --ref transcripts.fa --log-filepath Y1_5_1.log -t 20 --with-header
modkit pileup Y1_5_1.bam Y1_5_1_m6A.bed --ref transcripts.fa --log-filepath Y1_5_1_m6A.log -t 30 --with-header --ignore 17596
https://github.com/nanoporetech/modkit/issues/347
@ArtRand
1.Why is the number of a inconsistent?
2. In the previous question, I used --ignore 17596 because I only wanted to compare m6A. So if I want to compare differences in m5C on RNA, should I ignore 17802?
Hello @Tang-pro,
Why is the number of a inconsistent?
I'm assuming these counts are total passing modification calls, correct?
My guess is that when you use --ignore 17596 that some of the m6A calls fail the pass threshold. The threshold is estimated on the adjusted probabilities and is likely going to be different. The estimated pass threshold values should be printed in the log for each run.
In the previous question, I used --ignore 17596 because I only wanted to compare m6A. So if I want to compare differences in m5C on RNA, should I ignore 17802?
No, you shouldn't have to. 17802 (pseudouridine) is a U-modification (T in the BAM spec), so it modifies different reference positions.
One thing, I'm not sure if this applies to your analysis however, is that the MAP-based p-value metric in modkit dmr is a "modified/un-modified" metric so you may be able to skip the --ignore 17596 step.
@ArtRand
Thanks for your reply!
One thing, I'm not sure if this applies to your analysis however, is that the MAP-based p-value metric in modkit dmr is a "modified/un-modified" metric so you may be able to skip the --ignore 17596 step.
When performing modkit dmr analysis, if I only want to compare m6A, shouldn't I use --ignore 17596 to exclude other modifications?
When performing modkit dmr analysis, if I only want to compare m6A, shouldn't I use --ignore 17596 to exclude other modifications?
Yes, you can do that.
@ArtRand
My guess is that when you use --ignore 17596 that some of the m6A calls fail the pass threshold. The threshold is estimated on the adjusted probabilities and is likely going to be different. The estimated pass threshold values should be printed in the log for each run.
Actually, I still don’t quite understand 17596 represents inosine modification. Why would ignoring 17596 affect the analysis of m6A?