fishers_exact_test icon indicating copy to clipboard operation
fishers_exact_test copied to clipboard

Deviation from scipy and R for high OR or low p-value

Open agartland opened this issue 9 months ago • 1 comments

Hi I'm wondering if you've ever run into this issue that your implementation deviates from R and scipy for extreme ORs with low p-values? Example is below. I did notice that scipy seems to use int64 and even states in a comment that int32 is not sufficient (though that is with their implementation and I honestly haven't looked hard enough yet to see how yours differs, scipy note about int32)

We need these low p-values because we are needing to do a multiplicity adjustment and we have a lot of p-values (hence why we need a fast implementation!) Let us know if you have any ideas. Thanks, Andrew

scipy and brentp

a = 50 * np.ones(10, dtype=np.uint)
b = 1 * np.ones(10, dtype=np.uint)
c = 257 * np.ones(10, dtype=np.uint)
d = 375 * np.ones(10, dtype=np.uint)

res_brentp = fisher.pvalue_npy(a, b, c, d)[2][0]
res_scipy = stats.fisher_exact([[a[0], b[0]], [c[0], d[0]]])[1]
print(f'brentp: {res_brentp}\nscipy: {res_scipy}')

brentp: 1.368624798014407e-06
scipy: 1.2220470363522046e-17

R

fisher.test( matrix(c(50, 1, 257, 375), nrow = 2))

        Fisher's Exact Test for Count Data

data:  matrix(c(50, 1, 257, 375), nrow = 2)
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   12.28923 2901.05129
sample estimates:
odds ratio
   72.6098

agartland avatar Sep 28 '23 22:09 agartland

I think this is a limitation of the implementation here. If you need p-values that low then you must use scipy. I thought recent versions of scipy were quite fast.

brentp avatar Sep 29 '23 07:09 brentp