evalml Flaky threaded parallelism unit test

We saw test_score_pipelines_passes_X_train_y_train[multiclass-cf_threaded] flake on the linux nightlies on 2021-10-27.

Logs are here

We've seen this flake a couple of times (@angela97lin saw it fail on her PR once) so we're recording it here to see how widespread the problem is.

Although this is similar to #2669, that issue tracks figuring out why the process-based parallelism tests (now removed) used to flake.

Closing out this issue involves figuring out why this test can flake and proposing a fix if possible.

Oct 27 '21 14:10 freddyaboulton

Saw this again: https://github.com/alteryx/evalml/runs/4074051436?check_suite_focus=true 😬

Nov 01 '21 23:11 angela97lin

Found a similar one with binary + dask: https://github.com/alteryx/evalml/runs/4432800925?check_suite_focus=true (test_score_pipelines_passes_X_train_y_train[binary-dask_threaded])

Logs here: 9_Run unit tests.txt

Dec 06 '21 16:12 angela97lin

This flake has reared it's head again here on the v0.42.0 release PR.

It looks like the number of allowed pipelines is wrong. Additionally, I don't like seeing this:

  Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 42943 instead

evalml/tests/automl_tests/parallel_tests/test_automl_dask.py::test_score_pipelines_passes_X_train_y_train[regression-dask_threaded]
  Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 32973 instead

This suggests that the cluster isn't getting shut down as expected. I thought I remedied this with giving the engine a method to cleanly shut down the cluster.

Here's the stack trace:

=================================== FAILURES ===================================
____ test_score_pipelines_passes_X_train_y_train[multiclass-dask_threaded] _____
[gw0] linux -- Python 3.7.12 /home/runner/work/evalml/evalml/test_python/bin/python

problem_type = <ProblemTypes.MULTICLASS: 'multiclass'>
engine_str = 'dask_threaded'
X_y_binary = (array([[-0.03926799,  0.13191176, -0.21120598, ...,  1.97698901,
         1.02122474, -0.46931074],
       [ 0.774160...,
       1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0]))
X_y_regression = (array([[ 0.77416061, -0.61262574,  0.13391292, ...,  0.34649444,
        -0.79991362,  1.49613964],
       [ 1.039096...148 ,   88.21544211,   43.44418762,  -65.48361146,
       -117.05661346,   87.38980128,   69.14386291,  -75.38637791]))
X_y_multi = (array([[ 0.19347863,  0.6751333 , -0.09710764, ..., -0.31130299,
         0.50347912, -0.9998107 ],
       [-0.718549...,
       2, 0, 0, 0, 2, 2, 1, 1, 2, 1, 2, 2, 0, 2, 2, 2, 2, 0, 0, 1, 0, 2,
       1, 1, 2, 1, 0, 2, 0, 1, 0, 0, 0, 1]))
AutoMLTestEnv = <class 'evalml.tests.conftest._AutoMLTestEnv'>
ts_data_binary = (            features       date
2020-10-01       101 2020-10-01
2020-10-02       102 2020-10-02
2020-10-03       103 ...
2020-10-26    0
2020-10-27    1
2020-10-28    0
2020-10-29    1
2020-10-30    0
2020-10-31    1
Freq: D, dtype: int64)
ts_data_multi = (            features       date
2020-10-01       101 2020-10-01
2020-10-02       102 2020-10-02
2020-10-03       103 ...
2020-10-26    2
2020-10-27    0
2020-10-28    1
2020-10-29    2
2020-10-30    0
2020-10-31    1
Freq: D, dtype: int64)
ts_data = (            features       date
2020-10-01       101 2020-10-01
2020-10-02       102 2020-10-02
2020-10-03       103 ...10-26    26
2020-10-27    27
2020-10-28    28
2020-10-29    29
2020-10-30    30
2020-10-31    31
Freq: D, dtype: int64)

    @pytest.mark.parametrize(
        "engine_str",
        engine_strs + ["sequential"],
    )
    @pytest.mark.parametrize("problem_type", ProblemTypes.all_problem_types)
    def test_score_pipelines_passes_X_train_y_train(
        problem_type,
        engine_str,
        X_y_binary,
        X_y_regression,
        X_y_multi,
        AutoMLTestEnv,
        ts_data_binary,
        ts_data_multi,
        ts_data,
    ):
        if is_binary(problem_type):
            if is_time_series(problem_type):
                X, y = ts_data_binary
            else:
                X, y = X_y_binary
        elif is_multiclass(problem_type):
            if is_time_series(problem_type):
                X, y = ts_data_multi
            else:
                X, y = X_y_multi
        else:
            if is_time_series(problem_type):
                X, y = ts_data
            else:
                X, y = X_y_regression
    
        half = X.shape[0] // 2
        X_train, y_train = pd.DataFrame(X[:half]), pd.Series(y[:half])
        X_test, y_test = pd.DataFrame(X[half:]), pd.Series(y[half:])
    
        if is_multiclass(problem_type) or is_binary(problem_type):
            y_train = y_train.astype("int64")
            y_test = y_test.astype("int64")
    
        automl = AutoMLSearch(
            X_train=X_train,
            y_train=y_train,
            problem_type=problem_type,
            max_iterations=5,
            optimize_thresholds=False,
            problem_configuration={
                "time_index": "date",
                "gap": 0,
                "forecast_horizon": 1,
                "max_delay": 1,
            },
            engine=engine_str,
        )
    
        env = AutoMLTestEnv(problem_type)
        with env.test_context(score_return_value={automl.objective.name: 3.12}):
            automl.search()
    
        with env.test_context(score_return_value={automl.objective.name: 3.12}):
            automl.score_pipelines(
                automl.allowed_pipelines, X_test, y_test, [automl.objective]
            )
    
        expected_X_train, expected_y_train = None, None
        if is_time_series(problem_type):
            expected_X_train, expected_y_train = X_train, y_train
>       assert len(env.mock_score.mock_calls) == len(automl.allowed_pipelines)
E       AssertionError: assert 10 == 8
E        +  where 10 = len([call(          0         1         2   ...        17        18        19\n4   0.244781 -0.288961  1.114161  ...  1.153... int64, [<evalml.objectives.standard_metrics.LogLossMulticlass object at 0x7f3720f792d0>], X_train=None, y_train=None)])
E        +    where [call(          0         1         2   ...        17        18        19\n4   0.244781 -0.288961  1.114161  ...  1.153... int64, [<evalml.objectives.standard_metrics.LogLossMulticlass object at 0x7f3720f792d0>], X_train=None, y_train=None)] = <MagicMock name='score' id='139875262755280'>.mock_calls
E        +      where <MagicMock name='score' id='139875262755280'> = <evalml.tests.conftest._AutoMLTestEnv object at 0x7f373f396290>.mock_score
E        +  and   8 = len([pipeline = MulticlassClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Imputer':...n_jobs': -1}, 'pipeline':{'time_index': 'date', 'gap': 0, 'forecast_horizon': 1, 'max_delay': 1}}, random_seed=0), ...])
E        +    where [pipeline = MulticlassClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Imputer':...n_jobs': -1}, 'pipeline':{'time_index': 'date', 'gap': 0, 'forecast_horizon': 1, 'max_delay': 1}}, random_seed=0), ...] = <evalml.automl.automl_search.AutoMLSearch object at 0x7f373f3ab390>.allowed_pipelines

evalml/tests/automl_tests/parallel_tests/test_automl_dask.py:347: AssertionError

logs_60232.zip

Jan 19 '22 16:01 chukarsten

evalml evalml copied to clipboard

Flaky threaded parallelism unit test

evalml
evalml copied to clipboard