Clarification on Interpretation of 5hmC and 5mC
Hello @ArtRand
I have a query regarding the interpretation of the attached screenshot. Specifically, I would like clarification on the following points:
In the b_counts column, for the positions: Start: 813563 | End: 813564 – Since the counts show h:1, m:2, does this indicate that one strand of DNA carries 5hmC and the other strand carries 5mC? Start: 753496 | End: 753497 – Since the counts show h:0, m:3, does this mean that only 5mC is present on one strand of DNA and there is no 5hmC on the complementary strand? I want to ensure that I am interpreting this correctly. Kindly let me know if my understanding is accurate or if there’s another explanation I should consider.
Thanks & Regards Priyanka Roy
Hello @Proy321,
Start: 813563 | End: 813564 – Since the counts show h:1, m:2, does this indicate that one strand of DNA carries 5hmC and the other strand carries 5mC?
This means there was one read with a 5hmC call and 2 reads with 5mC calls. If by "strand" you mean "read" or DNA molecule, then yes. But often people use "strand" to mean the positive or negative strand with respect to the reference - this is not what that means.
Start: 753496 | End: 753497 – Since the counts show h:0, m:3, does this mean that only 5mC is present on one strand of DNA and there is no 5hmC on the complementary strand?
Same explanation as above, three reads had 5mC calls, and zero reads had 5hmC calls (but 5hmC probabilities were present).
I think you've got the right idea.
Hello @ArtRand Thank you for your response. I have a couple of follow-up questions for further clarification.
When both h:1 and m:2 are present at a given position, does this indicate that 5mC is being converted into 5hmC? For instance, at the position Start: 813563 | End: 813564, since it represents a single position, should I interpret this as 5mC undergoing conversion to 5hmC at the same site?
In another case, at Start: 753496 | End: 753497, where h:0 and m:3, only 5mC is present. Given that there is no call for 5hmC, why is this position included in the h,m context? Should it not be reflected solely in the m context? A possible explanation for this would be helpful.
Looking forward to your insights on this.
Thanks & Regards Priyanka Roy
Hello @ArtRand
It would be nice to have your inputs on the above queries.
Thanks & Regards Priyanka Roy
Hello @Proy321
I apologize for the delay.
When both h:1 and m:2 are present at a given position, does this indicate that 5mC is being converted into 5hmC?
I can't really say, this function in Modkit is really just a statistical test on counts, it's up to you to use these data to inform your biological question. What I would say, however, is that making too strong of a conclusion from ~5 reads might not be advised.
Given that there is no call for 5hmC, why is this position included in the h,m context? Should it not be reflected solely in the m context? A possible explanation for this would be helpful.
The output will report on all of the modifications encountered. So this record indicates that the base modification model output 5hmC probabilities, but that none of the passing calls were for 5hmC.
Hello @ArtRand Thank you so much for your response.I have a followup question regarding the same, and it would be nice too have your inputs on the same. Specifically, I am trying to understand how both modifications can be present at a single position rather than being assigned to distinct positions. For example, I observe both 5mC and 5hmC at positions 813563–813564. Could you please clarify how this is possible.
Additionally, I would appreciate your insights on the minimum read count threshold that should be considered for making a robust conclusion regarding DMR.
Thanks & Regards
Hello @ArtRand
It would be nice to have your inputs on the above queries.
Thanks & Regards Priyanka Roy
Hello @Proy321,
Specifically, I am trying to understand how both modifications can be present at a single position rather than being assigned to distinct positions. For example, I observe both 5mC and 5hmC at positions 813563–813564. Could you please clarify how this is possible.
What this table is showing you is that you have two reads reporting 5mC at position 813563 and one read reporting 5hmC. Generally speaking, base modifications can change at a given genomic position, thus individual reads/molecules will report different base modifications. What dmr tries to do is determine if the latent generative process that describes the observations between two conditions is different.
Additionally, I would appreciate your insights on the minimum read count threshold that should be considered for making a robust conclusion regarding DMR.
For larger effect sizes (>= 60%), 10 reads is probably sufficient. The MAP-based p-value will be higher (less significant) when the coverage is low. You can find the details of the MAP-based p-value and log-likelihood ratio score on the documentation.