Associations.jl Surrogate test suite needs an overhaul

There is a pretty long list of files in the test folder involved in the surrogate hypothesis tests for independence. However, almost all of them are like this:

# Analytical tests (in the limit of a lot of samples)
# ------------------------------------------------------------
using Random
rng = MersenneTwister(1234)
x, y = rand(rng, 500), rand(rng, 500)
z = x .+ y
test = SurrogateTest(HMeasure(); rng)
α = 0.04 # Some arbitrary significance level.

# We shouldn't be able to reject the null when the variables are independent
@test pvalue(independence(test, x, y)) > 0.04

# We should reject the null when variables are dependent.
@test pvalue(independence(test, x, z)) < 0.04

and the files repeat with a different measure each.

Changing the measure does not create any change of tests in the surrogate hypothesis test. After all, the whole purpose of the surrogate tests is that they are agnostic to the measure. So, in short, there should not be more than one file that tests the surrogates. Additionally, we should be testing the output of independece (i.e., the test result), not only the p-value.

Oct 03 '23 11:10 Datseris

Changing the measure does not create any change of tests in the surrogate hypothesis test.

They kind of are, though, because the different measures have different detection rates with different surrogate types. But we can probably work around this by just using a low threshold value for the p-value to reject/not reject the null.

Oct 03 '23 17:10 kahaaga

They kind of are, though, because the different measures have different detection rates with different surrogate types.

This is a comment about the measure, not the test though. The test is still agnostic to the measure input. And so is the source code. And in the end of the day we are testing the source code. Hence if we care about testing the surrogatetest, we don't need to test many measures. For the measure tests you could be using these things. But then we are talking about a test suite re-organization nevertheless.

at least, that's how I would view the test suite for the surrogate significance test. We test one case with 2 and one with 3 input arguments, which is the only thing that changes in the source code when the functionality is applied.

Oct 04 '23 10:10 Datseris

We test one case with 2 and one with 3 input arguments, which is the only thing that changes in the source code when the functionality is applied.

Also, we need to test that estimator-free measures are accepted without the second argument, and that estimator-dependent measures accepts the second argument and fails without specifying the second argument.

test = SurrogateTest(PartialCorrelation()) # second estimator argument not needed
test = SurrogateTest(TEShannon(), FPVP()) # second estimator argument required

Oct 05 '23 07:10 kahaaga