pyopencl icon indicating copy to clipboard operation
pyopencl copied to clipboard

Add __array__ method to pyopencl.Array

Open jni opened this issue 4 years ago • 10 comments

Hello! We are beginning to experiment with OpenCL over at scikit-image (see e.g. https://github.com/scikit-image/scikit-image/pull/4215). For us, it would be ideal np.asarray(pyopencl_array) worked transparently. The __array__ method just calls get, but enables easier compatibility within the NumPy ecosystem.

Here's the approach demonstrated with monkeypatching:

In [1]: import numpy as np
In [2]: from pyopencl.array import Array                                                                                      
In [3]: from gputools import OCLArray                                                                                         
In [4]: arr = OCLArray.from_array(np.random.random((32, 32)))                                                                 
In [5]: type(arr)                                                                                                             
Out[5]: pyopencl.array.Array
In [6]: np.asarray(arr)                                                                                                       
Out[6]: 
array([[array(0.2397247), array(0.77347383), array(0.96216983), ...,
        array(0.5608753), array(0.30588483), array(0.52376195)],
       [array(0.70506864), array(0.22409207), array(0.84294354), ...,
        array(0.21627319), array(0.72259475), array(0.36756701)],
       [array(0.37779019), array(0.23074081), array(0.31592551), ...,
        array(0.45015637), array(0.08316402), array(0.15636401)],
       ...,
       [array(0.30017419), array(0.39220295), array(0.0588779), ...,
        array(0.49011532), array(0.6003594), array(0.78047825)],
       [array(0.14872931), array(0.76511074), array(0.69126287), ...,
        array(0.0361155), array(0.07902312), array(0.12513675)],
       [array(0.80042382), array(0.16620888), array(0.28844463), ...,
        array(0.065685), array(0.9465674), array(0.56786556)]],
      dtype=object)

In [7]: Array.__array__ = Array.get                                                                                           
In [8]: np.asarray(arr)                                                                                                       
Out[8]: 
array([[0.2397247 , 0.77347383, 0.96216983, ..., 0.5608753 , 0.30588483,
        0.52376195],
       [0.70506864, 0.22409207, 0.84294354, ..., 0.21627319, 0.72259475,
        0.36756701],
       [0.37779019, 0.23074081, 0.31592551, ..., 0.45015637, 0.08316402,
        0.15636401],
       ...,
       [0.30017419, 0.39220295, 0.0588779 , ..., 0.49011532, 0.6003594 ,
        0.78047825],
       [0.14872931, 0.76511074, 0.69126287, ..., 0.0361155 , 0.07902312,
        0.12513675],
       [0.80042382, 0.16620888, 0.28844463, ..., 0.065685  , 0.9465674 ,
        0.56786556]])

For more information, see: https://docs.scipy.org/doc/numpy/user/basics.dispatch.html#module-numpy.doc.dispatch https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

Thank you!

jni avatar Oct 09 '19 07:10 jni

@inducer Yikes, looks like a bunch of stuff broken in CI that I hope is independent from this change and that probably takes priority over this. However, could I get some comment from you about whether this would be a welcome addition? Thanks!

jni avatar Oct 15 '19 01:10 jni

CI says the failure is in the test you added:

_ test_array_method[<context factory for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz' on 'Portable Computing Language'>] _
Traceback (most recent call last):
  File "/home/vsts/work/1/s/test/test_array.py", line 670, in test_array_method
    arr = make_random_array(queue, np.float32, (32, 32))
  File "/home/vsts/work/1/s/test/test_array.py", line 75, in make_random_array
    return rand(queue, shape=(size,), dtype=dtype)
  File "/home/vsts/work/1/s/.miniconda3/envs/testing/lib/python2.7/site-packages/pyopencl/clrandom.py", line 765, in rand
    result = Array(queue, shape, dtype)
  File "/home/vsts/work/1/s/.miniconda3/envs/testing/lib/python2.7/site-packages/pyopencl/array.py", line 500, in __init__
    context, cl.mem_flags.READ_WRITE, alloc_nbytes)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. pyopencl._cl.Buffer(context: pyopencl._cl.Context, flags: int, size: int = 0, hostbuf: object = None)

Invoked with: <pyopencl.Context at 0x55cc47968910 on <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz' on 'Portable Computing Language' at 0x55cc329c4fa0>>, 1, (32, 32, 32, 32, 32, 32, 32, 32)

(You can click through to "view details on Azure Pipelines" from the CI results)

I'm not sure I'm a fan of this, as Array.get() can be quite expensive if you're not careful, especially if you end up sloshing data back and forth between CPU and GPU inadvertently. So I'd like to retain an obvious syntactic hint that what you're doing is not free.

inducer avatar Oct 15 '19 02:10 inducer

I should ask, could you explain the specific need filled by having asarray work? Is it just convenience or something deeper?

inducer avatar Oct 15 '19 02:10 inducer

Ooops! Sorry, I somehow didn't find that line in the logs!

Re the utility of it: it's common and very useful to allow np.asarray on array-likes to work as expected, as it allows unifying code in pipelines that, of necessity, mix OpenCL and NumPy arrays. In my case, I'm trying to allow dispatching certain operations in scikit-image to the GPU, if OpenCL is correctly configured. Due to the prevalence of scikit-image, this must be optional. This means that I will end up peppering a lot of code with .get()s that could otherwise be totally transparent. In this code, for example, a simple shim around gputools.affine could allow me get rid of that entire if clause:

https://github.com/jni/scikit-image/commit/0c6fb69974278dca696a0cb5409a41952c830d2c#diff-28412ceac9e3406ba1842d7ba3938e44R177-R183

Yes, I understand that there are performance considerations, but a certain amount of responsibility should fall on library creators. In this case, because not everything I need is implemented in gputools, some operations have to happen in NumPy, and it'd be great if I could just feed in my pyopencl arrays transparently.

This whole endeavour is reasonably contentious, so the smaller I can keep the impact to the code, the better...

Thanks!

jni avatar Oct 15 '19 14:10 jni

  • Your code is still failing the Flake8 linter, see CI. FWIW, it is expected that your code passes CI before it gets merged.
  • What will this do for numpy_array + cl_array?
  • Will np.asarray make another copy? Or just return what __array__ returns?

inducer avatar Oct 17 '19 19:10 inducer

@inducer Apologies for the churn!

Your code is still failing the Flake8 linter, see CI.

Fixed.

FWIW, it is expected that your code passes CI before it gets merged.

Of course. Thanks for the reminder. =)

The Travis failures appear to be NumPy compilation failures, which I think are unrelated? (That's what confused me with the first commit.)

What will this do for numpy_array + cl_array?

This currently fails with a ValueError in PyOpenCL:

In [6]: np.random.random((32, 32)) + arr                                                                                                                                                      
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-80631e85a40f> in <module>
----> 1 np.random.random((32, 32)) + arr

~/miniconda3/envs/cf/lib/python3.6/site-packages/pyopencl/array.py in __add__(self, other)
    920         else:
    921             # add a scalar
--> 922             if other == 0:
    923                 return self.copy()
    924             else:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This failure is the same with and without this PR. More work needs to be done to support this, which I'd be interested in pursuing but probably belongs in a new PR. I think the correct way to get automatic coercion here is to raise NotImplemented, but I would have to verify this.

Will np.asarray make another copy?

Only if a specific dtype is requested that is different from the one provided by __array__. From the np.asarray docstring:

No copy is performed if the input is already an ndarray with matching dtype and order. If a is a subclass of ndarray, a base class ndarray is returned.

jni avatar Oct 18 '19 04:10 jni

:+1: to this change... It would make dispatching to pyopencl arrays much easier.

d-v-b avatar Oct 30 '19 14:10 d-v-b

@jni if you think that the CI issue is with numpy compilation, did you try rebasing on master and see if it disappears?

emmanuelle avatar Jan 08 '20 20:01 emmanuelle

@inducer, @emmanuelle's suggestion worked! Would you mind giving this another round of review? =) Thank you!

jni avatar Jan 10 '20 03:01 jni

@inducer do you have any lingering doubts about this? If it doesn't fit with your vision for pyopencl, please let me know, as I can potentially get this done from our end with subclassing or monkey-patching. I just prefer to contribute to the libraries I find useful — but I don't mean to presume what should go in!

Thanks again!

jni avatar Feb 15 '20 11:02 jni