SeqMonk
SeqMonk copied to clipboard
Issue when lowering threshold difference with EdgeR on methylation data
Hi guys,
Apologies again for second issue in a day (hopefully this is as easy to solve as the previous one).
I've got 1379358 probes that are in common with 12 samples (filtered for values between 0 and 100 so there are no unquantitated probes). I'm running Filter by Statistical Test > Proportion Based > Replicated > EdgeR for/rev between two sets of conditions/replicates (between the two conditions/replicates compared, all probes should be in common). I've noticed that I get the following number of regions at these absolute difference thresholds (pval thres 0.05):
35% = 189 probes 25% = 1162 probes 20% = 2102 probes 18% = 1613 probes 15% = 872 probes 5% = 277 probes
I would expect the number of probes to increase as I decrease the absolute difference threshold. Instead, it appears the number of probes are at their maximum around 20%, then drop in either direction. Is there something I'm misunderstanding here or running incorrectly with EdgeR?
Thanks,
Daniel
The issue here is that the filter within the test is a pre-filter, not a post-filter. It selects only probes with a certain degree of difference and then tests those. For the tests you're therefore balancing the loss of probes to test due to the pre-filter, against the gain in power you get by reducing multiple testing by performing fewer tests.
The filter is also not ideal since it doesn't use your current quantitation but just uses a simple percentage calculation based on the sum of all methylation calls over that probe, so you can filter for 20% diff but appear to have probes with a smaller difference when you plot out the results.
In all honesty this is probably a feature I should remove. A better approach is to pre-filter your probes based on coverage, rather than methylation level so you're only testing probes which have sufficient data to justify testing them. If you want to ensure a certain degree of difference you should then post filter your stats result with a differences filter to get only changes of a certain magnitude.
Hi Simon,
Sorry for my late response, thanks for answering. So I believe I applied a coverage filter in some way when I calculate Bisulphite Methylation Over Features. I have both the min count to include position and min observations to include feature set to 5. If I want to run the test and do the percentage filter after testing, what should I set the min percentage difference in the EdgeR test to ensure I'm getting all probes? I would think setting to zero would normally do it, but of course, the current issue might play a role.
Cheers,
Dan