CUDA test failure: `tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13`, but only the _first time_ it is run
Version of Awkward Array
HEAD
Description and code to reproduce
The first time I ran pytest tests-cuda on a new system, I got one test failure:
=============================================================== test session starts ================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
Matplotlib: 3.9.1
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, reverse-1.7.0, mpl-0.17.0, anyio-4.4.0, mock-3.14.0, cov-5.0.0, xdist-3.6.1
collected 694 items
tests-cuda/test_1276_cuda_num.py ......... [ 1%]
tests-cuda/test_1276_cuda_transfers.py ................ [ 3%]
tests-cuda/test_1276_cupy_interop.py . [ 3%]
tests-cuda/test_1276_from_cupy.py ..... [ 4%]
tests-cuda/test_1300_same_for_numba_cuda.py ....................... [ 7%]
tests-cuda/test_1381_check_errors.py . [ 7%]
tests-cuda/test_1809_array_cuda_jit.py .............. [ 9%]
tests-cuda/test_2327_array_interface.py . [ 10%]
tests-cuda/test_2649_dlpack_support.py . [ 10%]
tests-cuda/test_2922a_new_cuda_kernels.py ...................................................................... [ 20%]
tests-cuda/test_2922b_new_cuda_kernels.py ............................. [ 24%]
tests-cuda/test_3065a_cuda_kernels.py ...................................................................................................... [ 39%]
..................................................................................................................................... [ 58%]
tests-cuda/test_3065b_cuda_kernels.py ........................ [ 61%]
tests-cuda/test_3065c_cuda_kernels.py ................................................... [ 69%]
tests-cuda/test_3086_cuda_concatenate.py .................... [ 72%]
tests-cuda/test_3115_array_typed_cuda_jit.py . [ 72%]
tests-cuda/test_3130_cuda_listarray_getitem_next.py ................ [ 74%]
tests-cuda/test_3136_cuda_argmin_and_argmax.py sssssss [ 75%]
tests-cuda/test_3136_cuda_reducers.py .................. [ 78%]
tests-cuda/test_3140_cuda_jagged_and_masked_getitem.py .......................... [ 81%]
tests-cuda/test_3140_cuda_slicing.py .................... [ 84%]
tests-cuda/test_3141_cuda_misc.py ...... [ 85%]
tests-cuda/test_3149_complex_reducers.py ......................F.........ssss [ 90%]
tests-cuda/test_3150_combinations_n_equal_2.py ..................... [ 93%]
tests-cuda/test_3162_block_boundary_reducers.py ......ss.... [ 95%]
tests-cuda/test_3162_cuda_generic_reducer_operation.py .......................s....... [100%]
===================================================================== FAILURES =====================================================================
________________________________________________________ test_block_boundary_prod_complex13 ________________________________________________________
def test_block_boundary_prod_complex13():
np.random.seed(42)
array = np.random.randint(50, size=1000)
complex_array = np.vectorize(complex)(
array[0 : len(array) : 2], array[1 : len(array) : 2]
)
content = ak.contents.NumpyArray(complex_array)
cuda_content = ak.to_backend(content, "cuda", highlevel=False)
cpt.assert_allclose(
ak.prod(cuda_content, -1, highlevel=False),
ak.prod(content, -1, highlevel=False),
)
offsets = ak.index.Index64(np.array([0, 5, 996, 1000], dtype=np.int64))
depth1 = ak.contents.ListOffsetArray(offsets, content)
cuda_depth1 = ak.to_backend(depth1, "cuda", highlevel=False)
> cpt.assert_allclose(
to_list(ak.prod(cuda_depth1, -1, highlevel=False)),
to_list(ak.prod(depth1, -1, highlevel=False)),
)
array = array([38, 28, 14, 42, 7, 20, 38, 18, 22, 10, 10, 23, 35, 39, 23, 2, 21,
1, 23, 43, 29, 37, 1, 20, 32, 11, ...19, 24, 3, 9, 2, 40, 44, 17, 46, 35, 46, 21, 33, 46,
7, 39, 48, 43, 18, 41, 40, 36, 5, 25, 33, 44, 5, 36])
complex_array = array([38.+28.j, 14.+42.j, 7.+20.j, 38.+18.j, 22.+10.j, 10.+23.j,
35.+39.j, 23. +2.j, 21. +1.j, 23.+43.j, 29.+...7.j, 46.+35.j, 46.+21.j,
33.+46.j, 7.+39.j, 48.+43.j, 18.+41.j, 40.+36.j, 5.+25.j,
33.+44.j, 5.+36.j])
content = <NumpyArray dtype='complex128' len='500'>
[38.+28.j 14.+42.j 7.+20.j 38.+18.j 22.+10.j 10.+23.j 35.+39.j 23. +2.j... 44.+17.j 46.+35.j 46.+21.j 33.+46.j 7.+39.j 48.+43.j 18.+41.j
40.+36.j 5.+25.j 33.+44.j 5.+36.j]
</NumpyArray>
cuda_content = <NumpyArray dtype='complex128' len='500'>
[38.+28.j 14.+42.j 7.+20.j 38.+18.j 22.+10.j 10.+23.j 35.+39.j 23. +2.j... 44.+17.j 46.+35.j 46.+21.j 33.+46.j 7.+39.j 48.+43.j 18.+41.j
40.+36.j 5.+25.j 33.+44.j 5.+36.j]
</NumpyArray>
cuda_depth1 = <ListOffsetArray len='3'>
<offsets><Index dtype='int64' len='4'>
[ 0 5 996 1000]
</Index></offse... 7.+39.j 48.+43.j 18.+41.j 40.+36.j
5.+25.j 33.+44.j 5.+36.j]
</NumpyArray></content>
</ListOffsetArray>
depth1 = <ListOffsetArray len='3'>
<offsets><Index dtype='int64' len='4'>
[ 0 5 996 1000]
</Index></offse... 7.+39.j 48.+43.j 18.+41.j 40.+36.j
5.+25.j 33.+44.j 5.+36.j]
</NumpyArray></content>
</ListOffsetArray>
offsets = <Index dtype='int64' len='4'>
[ 0 5 996 1000]
</Index>
tests-cuda/test_3149_complex_reducers.py:575:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniforge3/lib/python3.11/site-packages/cupy/testing/_array.py:24: in assert_allclose
numpy.testing.assert_allclose(
actual = [(-29843744-33672352j), (nan+nanj), 0j]
atol = 0
desired = [(-29843744-33672352j), (nan+nanj), (1.4641000000000006-0j)]
err_msg = ''
rtol = 1e-07
verbose = True
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (<function assert_allclose.<locals>.compare at 0x7072ae3f9580>, array([-29843744.-33672352.j, nan +nanj,
... 0. +0.j]), array([-2.9843744e+07-33672352.j, nan +nanj,
1.4641000e+00 -0.j]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-07, atol=0', 'verbose': True}
@wraps(func)
def inner(*args, **kwds):
with self._recreate_cm():
> return func(*args, **kwds)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E
E Mismatched elements: 1 / 3 (33.3%)
E Max absolute difference: 1.4641
E Max relative difference: 1.
E x: array([-29843744.-33672352.j, nan +nanj,
E 0. +0.j])
E y: array([-2.984374e+07-33672352.j, nan +nanj,
E 1.464100e+00 -0.j])
args = (<function assert_allclose.<locals>.compare at 0x7072ae3f9580>, array([-29843744.-33672352.j, nan +nanj,
... 0. +0.j]), array([-2.9843744e+07-33672352.j, nan +nanj,
1.4641000e+00 -0.j]))
func = <function assert_array_compare at 0x7072f1849e40>
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-07, atol=0', 'verbose': True}
self = <contextlib._GeneratorContextManager object at 0x7072f1870790>
../../miniforge3/lib/python3.11/contextlib.py:81: AssertionError
============================================================= short test summary info ==============================================================
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:18: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:40: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:51: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:115: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:138: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [2] tests-cuda/test_3136_cuda_argmin_and_argmax.py:177: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:773: awkward_reduce_argmax_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:795: awkward_reduce_argmax_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:817: awkward_reduce_argmin_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:839: awkward_reduce_argmin_complex is not implemented
SKIPPED [1] tests-cuda/test_3162_block_boundary_reducers.py:121: awkward_reduce_argmin is not implemented
SKIPPED [1] tests-cuda/test_3162_block_boundary_reducers.py:139: awkward_reduce_argmax is not implemented
SKIPPED [1] tests-cuda/test_3162_cuda_generic_reducer_operation.py:847: awkward_reduce_argmin is not implemented
FAILED tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13 - AssertionError:
==================================================== 1 failed, 679 passed, 14 skipped in 21.50s ====================================================
The error is that the numerical result for this array is wrong.
Subsequently re-running this test did not result in any errors. That's very strange. I tried to make a reproducer on Google Colab, but couldn't install CuPy on it.
I also tried uninstalling and reinstalling Awkward:
% pip uninstall awkward
Found existing installation: awkward 2.6.7
Uninstalling awkward-2.6.7:
Would remove:
/home/jpivarski/miniforge3/lib/python3.11/site-packages/_awkward.pth
/home/jpivarski/miniforge3/lib/python3.11/site-packages/awkward-2.6.7.dist-info/*
/home/jpivarski/miniforge3/lib/python3.11/site-packages/awkward/juliapkg.json
Proceed (Y/n)?
Successfully uninstalled awkward-2.6.7
% pip install -e .
Obtaining file:///home/jpivarski/irishep/awkward
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Installing backend dependencies ... done
Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: awkward-cpp==37 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (37)
Requirement already satisfied: fsspec>=2022.11.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (2024.6.1)
Requirement already satisfied: importlib-metadata>=4.13.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (8.2.0)
Requirement already satisfied: numpy>=1.18.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (1.26.4)
Requirement already satisfied: packaging in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (24.1)
Requirement already satisfied: zipp>=0.5 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from importlib-metadata>=4.13.0->awkward==2.6.7) (3.20.0)
Building wheels for collected packages: awkward
Building editable for awkward (pyproject.toml) ... done
Created wheel for awkward: filename=awkward-2.6.7-py3-none-any.whl size=5067 sha256=0ddf47f970c3ab51619d8a5d6b0072a315f422f3883f77c4465ad3900915dd27
Stored in directory: /tmp/pip-ephem-wheel-cache-acpm0m7u/wheels/56/e1/a6/2c4dae09851e882a1c0d9a375beb305bc10de51cda49eccf35
Successfully built awkward
Installing collected packages: awkward
Successfully installed awkward-2.6.7
% pytest tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13
=============================================================== test session starts ================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
Matplotlib: 3.9.1
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, reverse-1.7.0, mpl-0.17.0, anyio-4.4.0, mock-3.14.0, cov-5.0.0, xdist-3.6.1
collected 1 item
tests-cuda/test_3149_complex_reducers.py . [100%]
================================================================ 1 passed in 3.69s =================================================================
But that didn't do it.
Maybe this has nothing to do with being the first time, and it's a very rare synchronization bug.
I tried running it 100 times:
for x in 0 1 2 3 4 5 6 7 8 9; do for y in 0 1 2 3 4 5 6 7 8 9; do pytest tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13; done; done
but that didn't do it—it still passes.
@ianna, if you can't reproduce it, just close this issue.
@jpivarski - yes, I can reproduce it. It did not fail when firstly the single test was run separately as:
python -m pytest tests-cuda/test_3149_complex_reducers.py
but it did fail when it was run as a full set of tests:
python -m pytest tests-cuda
fixed in https://github.com/scikit-hep/awkward/pull/3235