HypothesisTests.jl
HypothesisTests.jl copied to clipboard
CI for FisherExactTest is extremely slow for large numbers
When trying to calculate a confidence interval for FisherExactTest with large numbers, e.g. confint(FisherExactTest(2216,9338, 3172,1335))
, Julia gets stuck at 100% CPU for 10-15 minutes. As a comparison, in R fisher.test(matrix(c(2216, 9338, 3712, 1335), ncol=2, byrow=TRUE))
returns instantly and shows a confidence interval. confint(FisherExactTest(226,938, 312,135))
is also slow and crashes with
ERROR: ArgumentError: The interval [a,b] is not a bracketing interval.
You need f(a) and f(b) to have different signs (f(a) * f(b) < 0).
Consider a different bracket or try fzero(f, c) with an initial guess c.
(perhaps related to #122 ?)
I have also experienced this - FisherExactTest
appears unusable when the sum of the arguments exceeds ~5000. The function hangs even without calling confint
. I also occasionally see the error "ArgumentError: The interval [a,b] is not a bracketing interval".
I noticed that when I run your example with the time macro:
using HypothesisTests; @time begin; FisherExactTest(2216,9338, 3172,1335); end
I get the following result:
2.333884 seconds (4.72 M allocations: 236.939 MiB, 8.12% gc time)
So the time macro prints the elapsed time after two seconds but there is no result from the FisherExactTest
function. I am not familiar enough with the source code to understand the cause.
``
So the time macro prints the elapsed time after two seconds but there is no result from the
FisherExactTest
function. I am not familiar enough with the source code to understand the cause.
That's probably because the p-value and confidence intervals are only computing when printing the object.
I have also experienced this -
FisherExactTest
appears unusable when the sum of the arguments exceeds ~5000. The function hangs even without callingconfint
. I also occasionally see the error "ArgumentError: The interval [a,b] is not a bracketing interval".I noticed that when I run your example with the time macro:
using HypothesisTests; @time begin; FisherExactTest(2216,9338, 3172,1335); end
I get the following result:
2.333884 seconds (4.72 M allocations: 236.939 MiB, 8.12% gc time)
So the time macro prints the elapsed time after two seconds but there is no result from the
FisherExactTest
function. I am not familiar enough with the source code to understand the cause.
Same here, cannot get correct P values for Fisher's Exact Test when args are too large. FYI:
Fisher's exact test
-------------------
Population details:
parameter of interest: Odds ratio
value under h_0: 1.0
point estimate: 0.742925
Error showing value of type FisherExactTest:
ERROR: ArgumentError: The interval [a,b] is not a bracketing interval.
You need f(a) and f(b) to have different signs (f(a) * f(b) < 0).
Consider a different bracket or try fzero(f, c) with an initial guess c.
While I can get the correct answer using a C++ version of Fisher's Exact Test:
{0.0008557, 1.825331208626037*10^(-31)}
where the left is the absolute time it consumes, and the right is the P value.
I guess the reason is that the implementation of Julia is different from that of the one which is applied in bedtools?
Digged deeper into the Julia implementation, found that the root finding might be the cause?
(perhaps related to #122 ?)
Yes, the bracketing error seems to be a duplicate of #122.
BTW the general performance issues should have been fixed by https://github.com/JuliaStats/Distributions.jl/pull/1277.