cudf
cudf copied to clipboard
[BUG] cudf.pandas wrapped numpy arrays not compatible with numba
Describe the bug
When I try to use cudf.pandas with datashader, I get an error Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>
, full repro below. Datashader actually works directly with cudf, and a cudf.DataFrame is an exceptable data format. But using cudf as a no-code-change accelerator for pandas, this seems to fail.
Steps/Code to reproduce bug
import cudf.pandas
cudf.pandas.install()
import pandas as pd
import numpy as np
import datashader as ds
import datashader.transfer_functions as tf
from datashader.colors import inferno
# Create a small dataset
np.random.seed(0)
n = 1000
df = pd.DataFrame({
'x': np.random.normal(0, 1, n),
'y': np.random.normal(0, 1, n)
})
# Create a canvas to render the plot
cvs = ds.Canvas(plot_width=400, plot_height=400)
# Aggregate the points in the canvas
agg = cvs.points(df, 'x', 'y')
# Render the plot using a transfer function
img = tf.shade(agg, cmap=inferno, how='eq_hist')
# Display the plot
img
Output
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at [/home/ajay/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py](http://localhost:8888/lab/tree/dev/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py) (66)
File ".[./miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py", line 66](http://localhost:8888/lab/tree/dev/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py#line=65):
def _compute_bounds(s):
<source elided>
@staticmethod
^
This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>
Expected behavior Ideally same output as a cudf or a pandas dataframe.
Environment overview (please complete the following information)
- Environment location: Ubuntu
- Method of cuDF install: Conda
Thanks for the report. As your post highlights it looks like the core issue is that cudf.pandas
wraps numpy arrays (to use cupy if possible) and this wrapped array is not compatible with numba
In [1]: import cudf.pandas
...: cudf.pandas.install()
...:
...: import pandas as pd
i
In [2]: import numba
In [3]: @numba.jit(nopython=True, nogil=True)
...: def f(x):
...: return x
...:
In [4]: f(pd.Series([1]).values)
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
Cell In[4], line 1
----> 1 f(pd.Series([1]).values)
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/numba/core/dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
464 msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
465 f"by the following argument(s):\n{args_str}\n")
466 e.patch_message(msg)
--> 468 error_rewrite(e, 'typing')
469 except errors.UnsupportedError as e:
470 # Something unsupported is present in the user code, add help info
471 error_rewrite(e, 'unsupported_error')
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/numba/core/dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
407 raise e
408 else:
--> 409 raise e.with_traceback(None)
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at <ipython-input-3-88a5a2446c8f> (1)
File "<ipython-input-3-88a5a2446c8f>", line 1:
@numba.jit(nopython=True, nogil=True)
^
This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>
Going to repurpose this issue to be about compatibility with numba.
@brandon-b-miller when you have time can you also take a look at how cudf.pandas and numba are interoperating ?
There might be a way to write a little numba extension code within cudf.pandas
that registers cudf.pandas._wrappers.numpy.ndarray
objects as something numba can unbox into a numpy array or cupy array. If that worked we could probably do the registration at import time. I'll investigate.
Just a few quick updates here. We took a look at some simple ways of solving this with without too much hacking of numba and didn't come up with a solution we can merge into cuDF in the very immediate term. There's a few more medium term approaches in the form of updates to numba main that may do the trick however. I would like to keep this issue open as we progress and can give more updates here as we have them.
Closed by #16286
We reopened this issue because there were some issues to address. But this issue is now closed by #16601.