aeon
aeon copied to clipboard
[MNT] Cut down number of parameter combinations for testing
We are currently running same tests with different parameter values that are not bringing much value in terms of coverage.
Consider this test from forecasting
@pytest.mark.parametrize("y", TEST_YS)
@pytest.mark.parametrize("fh", [*TEST_FHS, *TEST_FHS_TIMEDELTA])
@pytest.mark.parametrize("window_length", TEST_WINDOW_LENGTHS)
@pytest.mark.parametrize("step_length", TEST_STEP_LENGTHS)
@pytest.mark.parametrize("initial_window", TEST_INITIAL_WINDOW)
def test_sliding_window_splitter_with_initial_window(
y, fh, window_length, step_length, initial_window
):
and the corresponding parameter sets:
TEST_OOS_FHS = [1, np.array([2, 5], dtype="int64")] # out-of-sample
TEST_INS_FHS = [
-3, # single in-sample
np.array([-2, -5], dtype="int64"), # multiple in-sample
0, # last training point
np.array([-3, 2], dtype="int64"), # mixed in-sample and out-of-sample
]
TEST_FHS = [*TEST_OOS_FHS, *TEST_INS_FHS]
TEST_OOS_FHS_TIMEDELTA = [
[pd.Timedelta(1, unit="D")],
[pd.Timedelta(2, unit="D"), pd.Timedelta(5, unit="D")],
] # out-of-sample
TEST_INS_FHS_TIMEDELTA = [
pd.Timedelta(-3, unit="D"), # single in-sample
[pd.Timedelta(-2, unit="D"), pd.Timedelta(-5, unit="D")], # multiple in-sample
pd.Timedelta(0, unit="D"), # last training point
[
pd.Timedelta(-3, unit="D"),
pd.Timedelta(2, unit="D"),
], # mixed in-sample and out-of-sample
]
TEST_FHS_TIMEDELTA = [*TEST_OOS_FHS_TIMEDELTA, *TEST_INS_FHS_TIMEDELTA]
TEST_WINDOW_LENGTHS_INT = [1, 5]
TEST_WINDOW_LENGTHS_TIMEDELTA = [pd.Timedelta(1, unit="D"), pd.Timedelta(5, unit="D")]
TEST_WINDOW_LENGTHS_DATEOFFSET = [pd.offsets.Day(1), pd.offsets.Day(5)]
TEST_WINDOW_LENGTHS = [
*TEST_WINDOW_LENGTHS_INT,
*TEST_WINDOW_LENGTHS_TIMEDELTA,
*TEST_WINDOW_LENGTHS_DATEOFFSET,
]
TEST_INITIAL_WINDOW_INT = [7, 10]
TEST_INITIAL_WINDOW_TIMEDELTA = [pd.Timedelta(7, unit="D"), pd.Timedelta(10, unit="D")]
TEST_INITIAL_WINDOW_DATEOFFSET = [pd.offsets.Day(7), pd.offsets.Day(10)]
TEST_INITIAL_WINDOW = [
*TEST_INITIAL_WINDOW_INT,
*TEST_INITIAL_WINDOW_TIMEDELTA,
*TEST_INITIAL_WINDOW_DATEOFFSET,
]
TEST_STEP_LENGTHS_INT = [1, 5]
TEST_STEP_LENGTHS_TIMEDELTA = [pd.Timedelta(1, unit="D"), pd.Timedelta(5, unit="D")]
TEST_STEP_LENGTHS_DATEOFFSET = [pd.offsets.Day(1), pd.offsets.Day(5)]
TEST_STEP_LENGTHS = [
*TEST_STEP_LENGTHS_INT,
*TEST_STEP_LENGTHS_TIMEDELTA,
*TEST_STEP_LENGTHS_DATEOFFSET,
]
which generates 2592 combinations for running the test. Looking closer most of the cases have 2 value of the same type that are redundant and can be dropped without loosing any coverage.
its madness
I think we are in a better place now @lmmentel can we close this?
Commented in #154 as well, I think we have accomplished this somewhat with the PR_TESTING setup.
agreed, this is fixed