cudf
cudf copied to clipboard
[FEA] Enable `cp.asarray(cudf.RangeIndex)`
Is your feature request related to a problem? Please describe. Currently if one attempts to explicitly materialize a cupy array via cudf.RangeIndex, this error is thrown:
(Pdb) import cudf
(Pdb) cp.asarray(cudf.RangeIndex(0, 100))
*** TypeError: Implicit conversion to a host NumPy array via __array__ is not allowed, To explicitly construct a GPU matrix, consider using .to_cupy()
To explicitly construct a host matrix, consider using .to_numpy().
Offline discussion with @vyasr and @pentschev suggests that we should have this usage working transparently. The benefit of this is that cp.asarray(obj)
would work for all cudf objects.
Describe the solution you'd like
The most straight forward way is to enable RangeIndex.__array__
, which is currently disabled. The rationale is that when __array__
is invoked, the intention of converting to numpy array is clear. However, additional care should be taken when it's being invoked within a cuDF API. According to @vyasr , we should leverage the frame tracking tooling to check if the __array__
interface is invoked internally in cuDF, or externally. If the former, we should raise an error and suggest that to_cupy
method should be used. If the latter, the API should work, but maybe a warning can be thrown suggesting this is not as efficient as to_cupy
.
if in either case we want the user to call to_cupy
, why not call it directly in __array__
?
Apologies for the slow response here. The main reason not to call it directly in __array__
is that it would be surprising to users if np.asarray(cudf.RangeIndex(...))
returned a cupy array instead of a numpy array, especially since there are types that are representable in cudf and in numpy but not in cupy so the above conversion would actually fail if we implicitly converted to cupy instead of numpy.