modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Modkit gives balanced map p-value in absence of replicates

Open Ge0rges opened this issue 5 months ago • 4 comments

Hi @ArtRand,

I've noticed that dmr position will give a balanced map p-value in absence of replicates. Is this expected? The values look "real". For example:

Image

Thanks!

In case I'm parsing something wrong in excel, here's the first row in that excel as the raw BED:

NC_003910.7 532650 532651 . 0.6736021290769827 - a:472 558 a:124 139 a:84.59 a:89.21 0.8458781 0.8920863 1 -0.040000000000000036 -0.13744347555152325 -0.048354073403230835 0.32324102450627734

Ge0rges avatar Jul 21 '25 22:07 Ge0rges

Hello @Ge0rges,

Nice to hear from you.

Seeing balanced_map_pvalue without replicates is expected. You probably already know this, but for anyone else reading, the balanced_map_pvalue works by calculating first the total number of calls across all replicates (even without them) and both conditions. Then it evenly divides this number between the two conditions, so the condition with lower coverage will gain coverage and the condition with higher coverage will reduce coverage. Then is uses the empirical fraction modified (from the bedMethyl) and multiplies it by the new coverage value to get a number of modified calls and unmodified calls. Finally it uses the MAP-based p-value calculation on these new counts.

Hope this helps, I'm looking forward to getting some items off my desk so I can deep-dive into some DMR enhancements.

ArtRand avatar Jul 23 '25 17:07 ArtRand

Got it, that was my understanding as well, but I got a little lost. So is it always correct to use the balanced p-value and balanced effect size then?

Ge0rges avatar Jul 23 '25 18:07 Ge0rges

So is it always correct to use the balanced p-value and balanced effect size then?

I'm not sure I'd say it's "correct". If the balanced and "raw" MAP-based p-values are quite different (e.g. raw = 1.0 and balanced <= 0.01) this might indicate that one of the samples has few sample-biased calls. For example, by chance you get a few reads that all call "modified" when the true p is closer to 0.5. It looks like both of your samples have pretty high coverage (e.g. the record you have posted where they have 139 and 558) but the effect sizes are pretty small. The MAP-based p-value will be 1 since the default --delta is 0.05 so effect sizes <= 0.05 will be treated as equivalent. You could reduce this number to get a better MAP-based p-value measurement.

You get a non-1 balanced MAP-based p-value for your first record with a balanced effect size of -0.048354073403230835 because there is a tiny fudge factor of 0.005 away from zero to make the computation more stable - this record happens to be right in that range. I'll add this to the doc-string for --delta.

ArtRand avatar Jul 24 '25 01:07 ArtRand

I think I need a little more guidance on when to use balanced. In the case where I have no replicates, a minimum coverage filter, the way I understand it I have two scenarios: A) The coverage is similar, so balanced and unbalanced p-values are ~equal B) The coverage is dissimilar, so balanced should be used?

this might indicate that one of the samples has few sample-biased calls. For example, by chance you get a few reads that all call "modified" when the true p is closer to 0.5. I guess I don't understand this well.

Should I also have a stricter minimum effect size perhaps?

Thanks!

Ge0rges avatar Jul 24 '25 21:07 Ge0rges