cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[BUG] cudf.pandas wrapped numpy arrays not compatible with numba

Open AjayThorve opened this issue 9 months ago • 5 comments

Describe the bug When I try to use cudf.pandas with datashader, I get an error Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>, full repro below. Datashader actually works directly with cudf, and a cudf.DataFrame is an exceptable data format. But using cudf as a no-code-change accelerator for pandas, this seems to fail.

Steps/Code to reproduce bug

import cudf.pandas
cudf.pandas.install()

import pandas as pd
import numpy as np
import datashader as ds
import datashader.transfer_functions as tf
from datashader.colors import inferno

# Create a small dataset
np.random.seed(0)
n = 1000
df = pd.DataFrame({
    'x': np.random.normal(0, 1, n),
    'y': np.random.normal(0, 1, n)
})

# Create a canvas to render the plot
cvs = ds.Canvas(plot_width=400, plot_height=400)

# Aggregate the points in the canvas
agg = cvs.points(df, 'x', 'y')

# Render the plot using a transfer function
img = tf.shade(agg, cmap=inferno, how='eq_hist')

# Display the plot
img

Output

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at [/home/ajay/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py](http://localhost:8888/lab/tree/dev/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py) (66)

File ".[./miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py", line 66](http://localhost:8888/lab/tree/dev/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py#line=65):
    def _compute_bounds(s):
        <source elided>

    @staticmethod
    ^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>

Expected behavior Ideally same output as a cudf or a pandas dataframe.

Environment overview (please complete the following information)

  • Environment location: Ubuntu
  • Method of cuDF install: Conda

AjayThorve avatar May 07 '24 19:05 AjayThorve

Thanks for the report. As your post highlights it looks like the core issue is that cudf.pandas wraps numpy arrays (to use cupy if possible) and this wrapped array is not compatible with numba

In [1]: import cudf.pandas
   ...: cudf.pandas.install()
   ...: 
   ...: import pandas as pd
i
In [2]: import numba

In [3]: @numba.jit(nopython=True, nogil=True)
   ...: def f(x):
   ...:     return x
   ...: 

In [4]: f(pd.Series([1]).values)
---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 f(pd.Series([1]).values)

File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/numba/core/dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
    464         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    465                f"by the following argument(s):\n{args_str}\n")
    466         e.patch_message(msg)
--> 468     error_rewrite(e, 'typing')
    469 except errors.UnsupportedError as e:
    470     # Something unsupported is present in the user code, add help info
    471     error_rewrite(e, 'unsupported_error')

File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/numba/core/dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    407     raise e
    408 else:
--> 409     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at <ipython-input-3-88a5a2446c8f> (1)

File "<ipython-input-3-88a5a2446c8f>", line 1:
@numba.jit(nopython=True, nogil=True)
^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>

Going to repurpose this issue to be about compatibility with numba.

mroeschke avatar May 09 '24 00:05 mroeschke

@brandon-b-miller when you have time can you also take a look at how cudf.pandas and numba are interoperating ?

quasiben avatar May 09 '24 14:05 quasiben

There might be a way to write a little numba extension code within cudf.pandas that registers cudf.pandas._wrappers.numpy.ndarray objects as something numba can unbox into a numpy array or cupy array. If that worked we could probably do the registration at import time. I'll investigate.

brandon-b-miller avatar May 09 '24 19:05 brandon-b-miller

Just a few quick updates here. We took a look at some simple ways of solving this with without too much hacking of numba and didn't come up with a solution we can merge into cuDF in the very immediate term. There's a few more medium term approaches in the form of updates to numba main that may do the trick however. I would like to keep this issue open as we progress and can give more updates here as we have them.

brandon-b-miller avatar May 23 '24 14:05 brandon-b-miller

Closed by #16286

Matt711 avatar Aug 16 '24 01:08 Matt711

We reopened this issue because there were some issues to address. But this issue is now closed by #16601.

Matt711 avatar Sep 05 '24 14:09 Matt711