HypothesisTests.jl icon indicating copy to clipboard operation
HypothesisTests.jl copied to clipboard

CI for FisherExactTest is extremely slow for large numbers

Open ilia-kats opened this issue 6 years ago • 5 comments

When trying to calculate a confidence interval for FisherExactTest with large numbers, e.g. confint(FisherExactTest(2216,9338, 3172,1335)), Julia gets stuck at 100% CPU for 10-15 minutes. As a comparison, in R fisher.test(matrix(c(2216, 9338, 3712, 1335), ncol=2, byrow=TRUE)) returns instantly and shows a confidence interval. confint(FisherExactTest(226,938, 312,135)) is also slow and crashes with

ERROR: ArgumentError: The interval [a,b] is not a bracketing interval.
You need f(a) and f(b) to have different signs (f(a) * f(b) < 0).
Consider a different bracket or try fzero(f, c) with an initial guess c.

(perhaps related to #122 ?)

ilia-kats avatar Feb 05 '19 07:02 ilia-kats

I have also experienced this - FisherExactTest appears unusable when the sum of the arguments exceeds ~5000. The function hangs even without calling confint. I also occasionally see the error "ArgumentError: The interval [a,b] is not a bracketing interval".

I noticed that when I run your example with the time macro:

using HypothesisTests; @time begin; FisherExactTest(2216,9338, 3172,1335); end

I get the following result: 2.333884 seconds (4.72 M allocations: 236.939 MiB, 8.12% gc time)

So the time macro prints the elapsed time after two seconds but there is no result from the FisherExactTest function. I am not familiar enough with the source code to understand the cause.

josephmarturano avatar Jun 19 '19 19:06 josephmarturano

``

So the time macro prints the elapsed time after two seconds but there is no result from the FisherExactTest function. I am not familiar enough with the source code to understand the cause.

That's probably because the p-value and confidence intervals are only computing when printing the object.

nalimilan avatar Jun 20 '19 07:06 nalimilan

I have also experienced this - FisherExactTest appears unusable when the sum of the arguments exceeds ~5000. The function hangs even without calling confint. I also occasionally see the error "ArgumentError: The interval [a,b] is not a bracketing interval".

I noticed that when I run your example with the time macro:

using HypothesisTests; @time begin; FisherExactTest(2216,9338, 3172,1335); end

I get the following result: 2.333884 seconds (4.72 M allocations: 236.939 MiB, 8.12% gc time)

So the time macro prints the elapsed time after two seconds but there is no result from the FisherExactTest function. I am not familiar enough with the source code to understand the cause.

Same here, cannot get correct P values for Fisher's Exact Test when args are too large. FYI:

Fisher's exact test
-------------------
Population details:
    parameter of interest:   Odds ratio
    value under h_0:         1.0
    point estimate:          0.742925
Error showing value of type FisherExactTest:
ERROR: ArgumentError: The interval [a,b] is not a bracketing interval.
You need f(a) and f(b) to have different signs (f(a) * f(b) < 0).
Consider a different bracket or try fzero(f, c) with an initial guess c.

While I can get the correct answer using a C++ version of Fisher's Exact Test:

{0.0008557, 1.825331208626037*10^(-31)}

where the left is the absolute time it consumes, and the right is the P value.

I guess the reason is that the implementation of Julia is different from that of the one which is applied in bedtools?

Digged deeper into the Julia implementation, found that the root finding might be the cause?

MitsuhaMiyamizu avatar Apr 09 '22 02:04 MitsuhaMiyamizu

(perhaps related to #122 ?)

Yes, the bracketing error seems to be a duplicate of #122.

devmotion avatar Apr 09 '22 08:04 devmotion

BTW the general performance issues should have been fixed by https://github.com/JuliaStats/Distributions.jl/pull/1277.

devmotion avatar Apr 09 '22 08:04 devmotion