qiime icon indicating copy to clipboard operation
qiime copied to clipboard

remove the G test from group_significance.py

Open wdwvt1 opened this issue 9 years ago • 1 comments

I have come across several situations now where the g_test in the script group_significance.py gives implausibly small p-values. For instance, in a recent forum post, an individual was comparing two treatment groups of size (3,3) and the g_test1 returned test statistics as large as 680. The associated p-values were recorded as 0 because they underflowed.

This seems implausibly small to me given the small sample sizes (and the fact that I have seen this behavior from the g_test repeatedly). I think it would be wise to remove this test from the available tests in group_significance.py until someone can take a clear look at it.

I implemented it based on Sokal and Rolhf, but I modified the procedure to take advantage of having multiple samples for the same feature to be tested (rather than just a single measurement). Although I can't find something wrong with it, these values are not reasonable and I am uncomfortable with it.

The downside to removal is small; there are 7 other tests available, and if the OTUs are truly differentially abundant that will be found by those other tests. There is no situation that I can see where only the g_test is required. Note that this would not eliminate the G test of independence that is used in make_otu_network.py. The only script affected would be group_significance.py.

I am under the impression that the only additional release in the QIIME 1.9.X series will be another bug fix, but I think this should receive high priority for that bug fix release. You can assign this to me (or someone else can take it).

wdwvt1 avatar Dec 16 '15 21:12 wdwvt1

Remember it assumes that the samples are homogeneous. Likely the samples are not homogeneous, but the heterogeneity comes from one of the many technical biases that can cause samples to differ (even differences in individual PCRs). I suggest adding a warning about this in the documentation (i.e. the test statistic may not mean what you think it does) rather than removing the test.

On Dec 16, 2015, at 1:23 PM, Will Van Treuren [email protected] wrote:

I have come across several situations now where the g_test in the script group_significance.py gives implausibly small p-values. For instance, in a recent forum post, an individual was comparing two treatment groups of size (3,3) and the g_test1 returned test statistics as large as 680. The associated p-values were recorded as 0 because they underflowed.

This seems implausibly small to me given the small sample sizes (and the fact that I have seen this behavior from the g_test repeatedly). I think it would be wise to remove this test from the available tests in group_significance.py until someone can take a clear look at it.

I implemented it based on Sokal and Rolhf, but I modified the procedure to take advantage of having multiple samples for the same feature to be tested (rather than just a single measurement). Although I can't find something wrong with it, these values are not reasonable and I am uncomfortable with it.

The downside to removal is small; there are 7 other tests available, and if the OTUs are truly differentially abundant that will be found by those other tests. There is no situation that I can see where only the g_test is required. Note that this would not eliminate the G test of independence that is used in make_otu_network.py. The only script affected would be group_significance.py.

I am under the impression that the only additional release in the QIIME 1.9.X series will be another bug fix, but I think this should receive high priority for that bug fix release. You can assign this to me (or someone else can take it).

— Reply to this email directly or view it on GitHub https://github.com/biocore/qiime/issues/2118.

rob-knight avatar Dec 16 '15 21:12 rob-knight