modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Unexpected 5mC calls on A bases – is this normal? Content:

Open hannan666666 opened this issue 7 months ago • 2 comments

Hi,

As far as I understand, 5mC modifications should not occur on adenine (A) bases. However, I noticed entries like the following in my pileup file: 1 3234432 3234433 m,AVV,0 23 + 3234432 3234433 255,0,0 23 69.57 16 7 0 1 0 130

(R4.4.1) hannan@RS720A-E12-RS12:/data1st2/hannan_25/data/Nanopore_process/nanopore_06_distribution/motif16code$ awk '$1 == "1" && $4 == "m,AVV,0"' /data1st2/hannan_25/data/Nanopore_process/nanopore_04_modkit/FC23A-AMY/FC23A-AMY.6mA_5mC5hmC.pileup.bed | head 1 3000821 3000822 m,AVV,0 1 - 3000821 3000822 255,0,0 1 0.00 0 1 0 0 0 2 0 1 3004120 3004121 m,AVV,0 1 + 3004120 3004121 255,0,0 1 0.00 0 1 0 0 1 130 1 3004762 3004763 m,AVV,0 1 - 3004762 3004763 255,0,0 1 100.00 1 0 0 0 1 150 1 3005117 3005118 m,AVV,0 1 - 3005117 3005118 255,0,0 1 0.00 0 1 0 0 0 160 1 3006032 3006033 m,AVV,0 1 - 3006032 3006033 255,0,0 1 0.00 0 1 0 0 4 100 1 3006182 3006183 m,AVV,0 1 + 3006182 3006183 255,0,0 1 0.00 0 1 0 0 0 140 1 3007568 3007569 m,AVV,0 1 - 3007568 3007569 255,0,0 1 0.00 0 1 0 1 1 130 1 3008262 3008263 m,AVV,0 1 - 3008262 3008263 255,0,0 1 0.00 0 1 0 0 1 130 1 3009628 3009629 m,AVV,0 1 - 3009628 3009629 255,0,0 1 0.00 0 1 0 0 3 130 1 3012066 3012067 m,AVV,0 1 + 3012066 3012067 255,0,0 1 0.00 0 1 0 0 2 130

and many others with the motif m,AVV,0. I'm unsure whether this is due to the basecalling (done with Dorado) or the modkit pileup step.

Is this kind of signal expected under certain conditions, or should it be considered an artifact? Have you encountered this before? Would you recommend filtering these out?

Thanks for your time!

hannan666666 avatar May 07 '25 08:05 hannan666666

Hello @hannan666666,

These entries are likely due to a few reads with A>C mismatches aligned to AVV motifs which will have m5C calls on the Cs. If you're interested in m6A calls at AAV motifs, I would filter these out. You can usually recognize these kinds of events by their valid coverage being quite a bit lower than the entries at the same position with modifications corresponding to the reference base.

I need to make sure that the N_diff and N_nocall counts are correct in this case, however, so I'm going to label this issue as a bug to remind myself.

ArtRand avatar May 07 '25 18:05 ArtRand

Well received, thank you for your advice!

hannan666666 avatar May 08 '25 02:05 hannan666666