array-api-tests icon indicating copy to clipboard operation
array-api-tests copied to clipboard

Skipped flaky tests

Open asmeurer opened this issue 1 year ago • 7 comments

Several tests are completely skipped right now because they are "flaky".

  • test_reshape
  • test_std
  • test_var
  • test_remainder

This is a pretty high priority issue because these functions are effectively completely untested, even though they appear to be tested.

Tests should be written in such a way that they aren't flaky, for instance, by using high numerical tolerances (or if necessary, avoiding values testing entirely).

Note that health checks for timeouts should just be skipped, and health checks for filtering too much should be fixed by fixing the strategy.

EDIT:

  • test_count_nonzero is not skipped but is flaky on JAX. Some discussion is at https://github.com/data-apis/array-api-tests/pull/347#issuecomment-2695654862

asmeurer avatar Nov 14 '24 20:11 asmeurer

Looking at test_std, https://github.com/data-apis/array-api-tests/blob/master/array_api_tests/test_statistical_functions.py#L262, it does not seem to attempt any value testing. Then what is flaky, assert_dtype or assert_keepdimable_shape?

ev-br avatar Nov 17 '24 12:11 ev-br

I've no idea what's flaky with any of these. The first order of business would to remove that decorator and figure out why the test was failing. It's also possible that some of these were only flaky with certain libraries.

asmeurer avatar Nov 18 '24 22:11 asmeurer

Also, it's possible the flakyness was fixed and the skip was never removed. It looks like skip for std was added in https://github.com/data-apis/array-api-tests/pull/233 (with no explanation) if you want to check previous versions.

At best, if the test seems to be passing, we can just remove the skip and see if any upstream failures are found. Like I mentioned in another issue, it's really easy to just revert changes here if they break stuff since we don't even have releases, so I wouldn't be too worried about that.

asmeurer avatar Nov 18 '24 22:11 asmeurer

test_reshape is fixed in gh-319

ev-br avatar Nov 23 '24 10:11 ev-br

Caught a test_std failure with array_api_compat.numpy:

array_api_tests/test_statistical_functions.py::test_std FAILED                   [100%]

    @given(
>       x=hh.arrays(
            dtype=hh.real_floating_dtypes,
            shape=hh.shapes(min_side=1),
            elements={"allow_nan": False},
        ).filter(lambda x: math.prod(x.shape) >= 2),
        data=st.data(),
    )

array_api_tests/test_statistical_functions.py:254: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def inconsistent_generation():
>       raise FlakyStrategyDefinition(
            "Inconsistent data generation! Data generation behaved differently "
            "between different runs. Is your data generation depending on external "
            "state?"
        )
E       hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data generation behaved differently between different runs. Is your data generation depending on external state?

../../miniforge3/envs/array-api-tests/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:52: FlakyStrategyDefinition
-------------------------------------- Hypothesis --------------------------------------
You can add @seed(146745493194750825545715057348996307346) to this test or run pytest with --hypothesis-seed=146745493194750825545715057348996307346 to reproduce this failure.
=================================== warnings summary ===================================
array_api_tests/test_statistical_functions.py: 59 warnings
  /home/br/miniforge3/envs/array-api-tests/lib/python3.11/site-packages/numpy/_core/_methods.py:227: RuntimeWarning: Degrees of freedom <= 0 for slice
    ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== short test summary info ================================
FAILED array_api_tests/test_statistical_functions.py::test_std - hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data gener...
===================== 1 failed, 7 deselected, 59 warnings in 3.14s =====================
(array-api-tests) br@gonzales:~/repos/array-api-tests$ 

ev-br avatar Nov 25 '24 18:11 ev-br

I can reproduce that with

ARRAY_API_TESTS_MODULE=array_api_compat.numpy pytest --disable-warnings array_api_tests/test_statistical_functions.py -k std -v --hypothesis-seed=146745493194750825545715057348996307346 --max-examples=10000

asmeurer avatar Nov 25 '24 19:11 asmeurer

I can't tell what is causing it. None of the strategies seem to be that unusual. The only thing I see that's a little different from the other tests is that the input array is filtered to have at least 2 elements, but that shouldn't be causing this error.

Unfortunately, hypothesis makes it quite hard to tell what's going on with this error. The only thing I can suggest would be to refactor the input strategies, e.g., to use shared instead of data.draw. Otherwise, we may want to report this upstream on the hypothesis repo, and see if the hypothesis devs can offer any advice. It may also just be a bug in hypothesis.

asmeurer avatar Nov 25 '24 19:11 asmeurer