aeon [MNT] Cut down number of parameter combinations for testing

We are currently running same tests with different parameter values that are not bringing much value in terms of coverage.

Consider this test from forecasting

@pytest.mark.parametrize("y", TEST_YS)
@pytest.mark.parametrize("fh", [*TEST_FHS, *TEST_FHS_TIMEDELTA])
@pytest.mark.parametrize("window_length", TEST_WINDOW_LENGTHS)
@pytest.mark.parametrize("step_length", TEST_STEP_LENGTHS)
@pytest.mark.parametrize("initial_window", TEST_INITIAL_WINDOW)
def test_sliding_window_splitter_with_initial_window(
    y, fh, window_length, step_length, initial_window
):

and the corresponding parameter sets:

TEST_OOS_FHS = [1, np.array([2, 5], dtype="int64")]  # out-of-sample
TEST_INS_FHS = [
    -3,  # single in-sample
    np.array([-2, -5], dtype="int64"),  # multiple in-sample
    0,  # last training point
    np.array([-3, 2], dtype="int64"),  # mixed in-sample and out-of-sample
]
TEST_FHS = [*TEST_OOS_FHS, *TEST_INS_FHS]
TEST_OOS_FHS_TIMEDELTA = [
    [pd.Timedelta(1, unit="D")],
    [pd.Timedelta(2, unit="D"), pd.Timedelta(5, unit="D")],
]  # out-of-sample
TEST_INS_FHS_TIMEDELTA = [
    pd.Timedelta(-3, unit="D"),  # single in-sample
    [pd.Timedelta(-2, unit="D"), pd.Timedelta(-5, unit="D")],  # multiple in-sample
    pd.Timedelta(0, unit="D"),  # last training point
    [
        pd.Timedelta(-3, unit="D"),
        pd.Timedelta(2, unit="D"),
    ],  # mixed in-sample and out-of-sample
]
TEST_FHS_TIMEDELTA = [*TEST_OOS_FHS_TIMEDELTA, *TEST_INS_FHS_TIMEDELTA]
TEST_WINDOW_LENGTHS_INT = [1, 5]
TEST_WINDOW_LENGTHS_TIMEDELTA = [pd.Timedelta(1, unit="D"), pd.Timedelta(5, unit="D")]
TEST_WINDOW_LENGTHS_DATEOFFSET = [pd.offsets.Day(1), pd.offsets.Day(5)]
TEST_WINDOW_LENGTHS = [
    *TEST_WINDOW_LENGTHS_INT,
    *TEST_WINDOW_LENGTHS_TIMEDELTA,
    *TEST_WINDOW_LENGTHS_DATEOFFSET,
]

TEST_INITIAL_WINDOW_INT = [7, 10]
TEST_INITIAL_WINDOW_TIMEDELTA = [pd.Timedelta(7, unit="D"), pd.Timedelta(10, unit="D")]
TEST_INITIAL_WINDOW_DATEOFFSET = [pd.offsets.Day(7), pd.offsets.Day(10)]
TEST_INITIAL_WINDOW = [
    *TEST_INITIAL_WINDOW_INT,
    *TEST_INITIAL_WINDOW_TIMEDELTA,
    *TEST_INITIAL_WINDOW_DATEOFFSET,
]

TEST_STEP_LENGTHS_INT = [1, 5]
TEST_STEP_LENGTHS_TIMEDELTA = [pd.Timedelta(1, unit="D"), pd.Timedelta(5, unit="D")]
TEST_STEP_LENGTHS_DATEOFFSET = [pd.offsets.Day(1), pd.offsets.Day(5)]
TEST_STEP_LENGTHS = [
    *TEST_STEP_LENGTHS_INT,
    *TEST_STEP_LENGTHS_TIMEDELTA,
    *TEST_STEP_LENGTHS_DATEOFFSET,
]

which generates 2592 combinations for running the test. Looking closer most of the cases have 2 value of the same type that are redundant and can be dropped without loosing any coverage.

Mar 04 '23 14:03 lmmentel

its madness

Mar 04 '23 20:03 TonyBagnall

I think we are in a better place now @lmmentel can we close this?

Oct 16 '23 07:10 TonyBagnall

Commented in #154 as well, I think we have accomplished this somewhat with the PR_TESTING setup.

May 08 '24 23:05 MatthewMiddlehurst

agreed, this is fixed

Jun 27 '24 20:06 TonyBagnall

aeon aeon copied to clipboard

[MNT] Cut down number of parameter combinations for testing

aeon
aeon copied to clipboard