GPU: tests fail with `TypeError: can_cast()`
Version of Awkward Array
2.6.7
Description and code to reproduce
25 tests-cuda and 131 tests-cuda-kernels-explicit fail what looks like in min/max reducers with TypeError: can_cast()
awkward-cpp 37
cupy 13.2.0
cuda-version 12.6
tests-cuda-kernels-explicit/test_unit_cudaawkward_reduce_min_uint8_uint8_64.py:47:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/awkward/_kernels.py:169: in __call__
self._impl(grid, blocks, args)
ak_cuda = <module 'awkward._connect.cuda' from '/home/yana/Projects/PR3205/awkward/src/awkward/_connect/cuda/__init__.py'>
args = (array([123, 123, 123, 123], dtype=uint8), array([1, 3, 5, 4, 2, 2, 3, 1, 5], dtype=uint8), array([0, 0, 0, 0, 0, 2, 2, 2, 3]), 9, 4, 4, ...)
blocks = (9, 1, 1)
cupy = <module 'cupy' from '/home/yana/miniconda3/envs/awkward-cuda/lib/python3.12/site-packages/cupy/__init__.py'>
cupy_stream_ptr = 0
grid = (1, 1, 1)
maxlength = 9
self = <CupyKernel awkward_reduce_min, uint8, uint8, int64>
src/awkward/_connect/cuda/_kernel_signatures.py:2969: in f
temp = cupy.full(lenparents, identity, dtype=toptr.dtype)
args = (array([123, 123, 123, 123], dtype=uint8), array([1, 3, 5, 4, 2, 2, 3, 1, 5], dtype=uint8), array([0, 0, 0, 0, 0, 2, 2, 2, 3]), 9, 4, 4, ...)
block = (9, 1, 1)
cuda_kernel_templates = <cupy._core.raw.RawModule object at 0x7f233daaeda0>
err_code = array(18446744073709551615, dtype=uint64)
fromptr = array([1, 3, 5, 4, 2, 2, 3, 1, 5], dtype=uint8)
grid = (1, 1, 1)
grid_size = 1
identity = 4
invocation_index = 411
lenparents = 9
outlength = 4
parents = array([0, 0, 0, 0, 0, 2, 2, 2, 3])
toptr = array([123, 123, 123, 123], dtype=uint8)
../../../miniconda3/envs/awkward-cuda/lib/python3.12/site-packages/cupy/_creation/basic.py:325: in full
cupy.copyto(a, fill_value, casting='unsafe')
a = array([123, 0, 0, 0, 0, 0, 0, 0, 123], dtype=uint8)
dtype = dtype('uint8')
fill_value = 4
order = 'C'
shape = 9
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dst = array([123, 0, 0, 0, 0, 0, 0, 0, 123], dtype=uint8), src = 4, casting = 'unsafe', where = None
def copyto(dst, src, casting='same_kind', where=None):
"""Copies values from one array to another with broadcasting.
This function can be called for arrays on different devices. In this case,
casting, ``where``, and broadcasting is not supported, and an exception is
raised if these are used.
Args:
dst (cupy.ndarray): Target array.
src (cupy.ndarray): Source array.
casting (str): Casting rule. See :func:`numpy.can_cast` for detail.
where (cupy.ndarray of bool): If specified, this array acts as a mask,
and an element is copied only if the corresponding element of
``where`` is True.
.. seealso:: :func:`numpy.copyto`
"""
src_is_numpy_scalar = False
src_type = type(src)
src_is_python_scalar = src_type in (
int, bool, float, complex,
fusion._FusionVarScalar, _fusion_interface._ScalarProxy)
if src_is_python_scalar:
src_dtype = numpy.dtype(type(src))
> can_cast = numpy.can_cast(src, dst.dtype, casting)
E TypeError: can_cast() does not support Python ints, floats, and complex because the result used to depend on the value.
E This change was part of adopting NEP 50, we may explicitly allow them again in the future.
casting = 'unsafe'
dst = array([123, 0, 0, 0, 0, 0, 0, 0, 123], dtype=uint8)
src = 4
src_dtype = dtype('int64')
src_is_numpy_scalar = False
src_is_python_scalar = True
src_type = <class 'int'>
where = None
../../../miniconda3/envs/awkward-cuda/lib/python3.12/site-packages/cupy/_manipulation/basic.py:38: TypeError
Oddly, I can't reproduce these failures with my GPU. My package versions are
# Name Version Build Channel
awkward-cpp 37 pypi_0 pypi
cupy 13.2.0 py311he5a987b_1 conda-forge
cupy-core 13.2.0 py311h3bdf873_1 conda-forge
cuda-version 12.4 h3060b56_3 conda-forge
CUDA driver Version: 550.67 NVIDIA GeForce RTX 3060
I ran all of the CUDA-related tests and observed no errors.
Maybe it's a difference between our GPUs, but also be sure to do a clean installation of Awkward,
pip uninstall awkward awkward-cpp
followed by the nox, pip install ./awkward-cpp, pip install -e . sequence, just in case it's a discrepancy from an old file.
awkward$ conda list
# packages in environment at /home/yana/miniconda3/envs/awkward-cuda:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
argcomplete 3.4.0 pyhd8ed1ab_0 conda-forge
awkward 2.6.7 pypi_0 pypi
awkward-cpp 37 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6
ca-certificates 2024.7.4 hbcca054_0 conda-forge
cachetools 5.4.0 pyhd8ed1ab_0 conda-forge
chardet 5.2.0 py312h7900ff3_1 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
colorlog 6.8.2 py312h7900ff3_0 conda-forge
cuda-nvrtc 12.6.20 he02047a_0 conda-forge
cuda-version 12.6 h7480c83_3 conda-forge
cupy 13.2.0 py312had87585_1 conda-forge
cupy-core 13.2.0 py312hd074ebb_1 conda-forge
distlib 0.3.8 pyhd8ed1ab_0 conda-forge
exceptiongroup 1.2.2 pyhd8ed1ab_0 conda-forge
expat 2.6.2 h6a678d5_0
fastrlock 0.8.2 py312h30efb56_2 conda-forge
filelock 3.15.4 pyhd8ed1ab_0 conda-forge
fsspec 2024.6.1 pypi_0 pypi
iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge
jinja2 3.1.4 pyhd8ed1ab_0 conda-forge
ld_impl_linux-64 2.38 h1181459_1
libblas 3.9.0 23_linux64_openblas conda-forge
libcblas 3.9.0 23_linux64_openblas conda-forge
libcublas 12.6.0.22 he02047a_0 conda-forge
libcufft 11.2.6.28 he02047a_0 conda-forge
libcurand 10.3.7.37 he02047a_0 conda-forge
libcusolver 11.6.4.38 he02047a_0 conda-forge
libcusparse 12.5.2.23 he02047a_0 conda-forge
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.4 h6a678d5_1
libgcc-ng 14.1.0 h77fa898_0 conda-forge
libgfortran-ng 14.1.0 h69a702a_0 conda-forge
libgfortran5 14.1.0 hc5f4f2c_0 conda-forge
libgomp 14.1.0 h77fa898_0 conda-forge
liblapack 3.9.0 23_linux64_openblas conda-forge
libllvm14 14.0.6 hcd5def8_4 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libnvjitlink 12.6.20 he02047a_0 conda-forge
libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge
libsqlite 3.45.2 h2797004_0 conda-forge
libstdcxx-ng 14.1.0 hc0a3c3a_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libzlib 1.3.1 h4ab18f5_1 conda-forge
llvmlite 0.43.0 py312h9c5d478_0 conda-forge
markupsafe 2.1.5 py312h98912ed_0 conda-forge
ncurses 6.4 h6a678d5_0
nox 2024.4.15 pyhff2d567_0 conda-forge
numba 0.60.0 py312h83e6fd3_0 conda-forge
numba-cuda 0.0.13 py_0 nvidia
numpy 2.0.1 py312h1103770_0 conda-forge
openssl 3.3.1 h4bc722e_2 conda-forge
packaging 24.1 pyhd8ed1ab_0 conda-forge
pip 24.0 py312h06a4308_0
platformdirs 4.2.2 pyhd8ed1ab_0 conda-forge
pluggy 1.5.0 pyhd8ed1ab_0 conda-forge
pyproject-api 1.7.1 pyhd8ed1ab_0 conda-forge
pytest 8.3.2 pyhd8ed1ab_0 conda-forge
python 3.12.2 hab00c5b_0_cpython conda-forge
python_abi 3.12 4_cp312 conda-forge
readline 8.2 h5eee18b_0
setuptools 72.1.0 py312h06a4308_0
sqlite 3.45.2 h2c6b66d_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
tox 4.17.0 pyhd8ed1ab_0 conda-forge
tzdata 2024a h04d1e81_0
virtualenv 20.26.3 pyhd8ed1ab_0 conda-forge
wheel 0.43.0 py312h06a4308_0
xz 5.4.6 h5eee18b_1
zlib 1.3.1 h4ab18f5_1 conda-forge
Thu Aug 8 13:07:36 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.27 Driver Version: 560.70 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 On | 00000000:01:00.0 On | N/A |
| 0% 36C P8 23W / 320W | 406MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 26 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
/awkward$ python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
@jpivarski - it looks like we need a newer cupy version for that. The bug has been reported in https://github.com/scipy/scipy/issues/21227
We can increase our lower bound on CuPy to whatever is needed. (CuPy is an optional dependency.)