array-api-tests Skipped flaky tests

Several tests are completely skipped right now because they are "flaky".

test_reshape
test_std
test_var
test_remainder

This is a pretty high priority issue because these functions are effectively completely untested, even though they appear to be tested.

Tests should be written in such a way that they aren't flaky, for instance, by using high numerical tolerances (or if necessary, avoiding values testing entirely).

Note that health checks for timeouts should just be skipped, and health checks for filtering too much should be fixed by fixing the strategy.

EDIT:

test_count_nonzero is not skipped but is flaky on JAX. Some discussion is at https://github.com/data-apis/array-api-tests/pull/347#issuecomment-2695654862

Nov 14 '24 20:11 asmeurer

Looking at test_std, https://github.com/data-apis/array-api-tests/blob/master/array_api_tests/test_statistical_functions.py#L262, it does not seem to attempt any value testing. Then what is flaky, assert_dtype or assert_keepdimable_shape?

Nov 17 '24 12:11 ev-br

I've no idea what's flaky with any of these. The first order of business would to remove that decorator and figure out why the test was failing. It's also possible that some of these were only flaky with certain libraries.

Nov 18 '24 22:11 asmeurer

Also, it's possible the flakyness was fixed and the skip was never removed. It looks like skip for std was added in https://github.com/data-apis/array-api-tests/pull/233 (with no explanation) if you want to check previous versions.

At best, if the test seems to be passing, we can just remove the skip and see if any upstream failures are found. Like I mentioned in another issue, it's really easy to just revert changes here if they break stuff since we don't even have releases, so I wouldn't be too worried about that.

Nov 18 '24 22:11 asmeurer

test_reshape is fixed in gh-319

Nov 23 '24 10:11 ev-br

Caught a test_std failure with array_api_compat.numpy:

array_api_tests/test_statistical_functions.py::test_std FAILED                   [100%]

    @given(
>       x=hh.arrays(
            dtype=hh.real_floating_dtypes,
            shape=hh.shapes(min_side=1),
            elements={"allow_nan": False},
        ).filter(lambda x: math.prod(x.shape) >= 2),
        data=st.data(),
    )

array_api_tests/test_statistical_functions.py:254: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def inconsistent_generation():
>       raise FlakyStrategyDefinition(
            "Inconsistent data generation! Data generation behaved differently "
            "between different runs. Is your data generation depending on external "
            "state?"
        )
E       hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data generation behaved differently between different runs. Is your data generation depending on external state?

../../miniforge3/envs/array-api-tests/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:52: FlakyStrategyDefinition
-------------------------------------- Hypothesis --------------------------------------
You can add @seed(146745493194750825545715057348996307346) to this test or run pytest with --hypothesis-seed=146745493194750825545715057348996307346 to reproduce this failure.
=================================== warnings summary ===================================
array_api_tests/test_statistical_functions.py: 59 warnings
  /home/br/miniforge3/envs/array-api-tests/lib/python3.11/site-packages/numpy/_core/_methods.py:227: RuntimeWarning: Degrees of freedom <= 0 for slice
    ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== short test summary info ================================
FAILED array_api_tests/test_statistical_functions.py::test_std - hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data gener...
===================== 1 failed, 7 deselected, 59 warnings in 3.14s =====================
(array-api-tests) br@gonzales:~/repos/array-api-tests$

Nov 25 '24 18:11 ev-br

I can reproduce that with

ARRAY_API_TESTS_MODULE=array_api_compat.numpy pytest --disable-warnings array_api_tests/test_statistical_functions.py -k std -v --hypothesis-seed=146745493194750825545715057348996307346 --max-examples=10000

Nov 25 '24 19:11 asmeurer

I can't tell what is causing it. None of the strategies seem to be that unusual. The only thing I see that's a little different from the other tests is that the input array is filtered to have at least 2 elements, but that shouldn't be causing this error.

Unfortunately, hypothesis makes it quite hard to tell what's going on with this error. The only thing I can suggest would be to refactor the input strategies, e.g., to use shared instead of data.draw. Otherwise, we may want to report this upstream on the hypothesis repo, and see if the hypothesis devs can offer any advice. It may also just be a bug in hypothesis.

Nov 25 '24 19:11 asmeurer