pyopencl Make SVM practical

Draft because it's missing:

[ ] Tests
- [ ] Ensure that SVM allocators can't be passed to buffer mem pool and vice versa.
[ ] A memory pool for SVM
- [ ] Quite subtle: #447
[ ] Documentation, especially about synchronization of deallocation with pending work.
[ ] ~~demo_array_svm crashes for the moment,~~ the queue-associated SVM allocation is one potential solution for that, at least for in-order queues. See also #449.

Mar 29 '21 05:03 inducer

With 3abf89c, examples/svm.py is failing with the following message:

  File "/shared/home/mdiener/Work/pyopencl/examples/svm.py", line 35, in <module>
    svm_ary = cl.SVM(cl.csvm_empty(ctx, 10, np.float32))
  File "/shared/home/mdiener/Work/pyopencl/pyopencl/__init__.py", line 2104, in csvm_empty
    return svm_empty(ctx, svm_mem_flags.READ_WRITE, shape, dtype, order, alignment)
  File "/shared/home/mdiener/Work/pyopencl/pyopencl/__init__.py", line 2062, in svm_empty
    svm_alloc = _ArrayInterfaceSVMAllocation(
NameError: name '_ArrayInterfaceSVMAllocation' is not defined

The following patch works around it (it also adds the missing self.__array_interface__):

diff --git a/pyopencl/__init__.py b/pyopencl/__init__.py
index 22562d1..9c62cea 100644
--- a/pyopencl/__init__.py
+++ b/pyopencl/__init__.py
@@ -1141,21 +1141,6 @@ def _add_functionality():
             .. automethod:: enqueue_release
             """

-    if get_cl_header_version() >= (2, 0):
-        class _ArrayInterfaceSVMAllocation(SVMAllocation):
-            def __init__(self, ctx, size, alignment, flags, _interface=None,
-                    queue=None):
-                """
-                :arg ctx: a :class:`Context`
-                :arg flags: some of :class:`svm_mem_flags`.
-                """
-                super().__init__(ctx, size, alignment, flags, queue)
-
-                # mem_flags.READ_ONLY applies to kernels, not the host
-                read_write = True
-                _interface["data"] = (
-                        int(self._ptr_as_int()), not read_write)
-
     # }}}

     # {{{ SVM
@@ -1396,6 +1381,23 @@ def _add_functionality():

 _add_functionality()

+if get_cl_header_version() >= (2, 0):
+    class _ArrayInterfaceSVMAllocation(SVMAllocation):
+        def __init__(self, ctx, size, alignment, flags, _interface=None,
+                queue=None):
+            """
+            :arg ctx: a :class:`Context`
+            :arg flags: some of :class:`svm_mem_flags`.
+            """
+            super().__init__(ctx, size, alignment, flags, queue)
+
+            # mem_flags.READ_ONLY applies to kernels, not the host
+            read_write = True
+            _interface["data"] = (
+                    int(self._ptr_as_int()), not read_write)
+
+            self.__array_interface__ = _interface
+
 # }}}

(Do you prefer me to send these as PRs on top of this PR?)

Jul 13 '22 16:07 matthiasdiener

7f20c759f47278ef14b69cc3d5546993cbb52529 is a WIP commit that doesn't even compile. It's intended to prevent further duplication of work with @matthiasdiener (cf. #587). Sorry about not getting this pushed sooner.

Jul 22 '22 14:07 inducer

0cf016ff73bc3f7985b7ca227c4a8194e847ba0e has broken compatibility with examples/svm.py. I'll look into it.

Jul 29 '22 00:07 inducer

Attemping to run svm.py on a Crusher MI250X with Rocm 5.2.0 results in a segmentation fault.

Device 'gfx90a:sramecc-:xnack-' on platform 'AMD Accelerated Parallel Processing (OpenCL 2.1 AMD-APP (3452.0))' has the following SVM features:
  Coarse-grained buffer SVM: True
  Fine-grained buffer SVM:   True
  Fine-grained system SVM:   False
/autofs/nccs-svm1_home1/njchris/Workspace/pyopencl/pyopencl/__init__.py:272: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  warn("Non-empty compiler output encountered. Set the "
Segmentation fault

Aug 02 '22 04:08 nchristensen

@nchristensen

Segmentation fault

Are you able to get a backtrace (e.g. with gdb?)

Aug 02 '22 22:08 inducer

It appears that a good part of the slower execution of SVM comes from the ordering of the argument type checks in Kernel.set_arg:

https://github.com/inducer/pyopencl/blob/8bce4f6cce70bf22baa8fd238784f3557e942b55/src/wrap_cl.hpp#L4504-L4518

We check for memory objects (images, buffers) first, and for SVM second. FWIW, reversing that order seems to help a lot, and the exception handling appears to play a big part in the time spent. Unfortunately, according to https://github.com/pybind/pybind11/issues/649, the try/except solution appears to be the only way to see whether a cast will succeed in pybind.

Aug 03 '22 00:08 inducer

After pulling in the lastest changes and recompiling I am no longer seeing segmentation faults on Rocm 5.2.0 for svm.py or demo_array_svm.py.

Aug 04 '22 19:08 nchristensen

With the changes in https://github.com/inducer/pyopencl/compare/7ef5ce5ec280038559295d45986c96302c193d6d..87420b1334806b7fa0fb0104e80a09366869b9f0, this performs about equivalently to buffers (with pocl) in my unscientific benchmarking.

Aug 05 '22 21:08 inducer

Should the loopy interface in https://github.com/inducer/loopy/pull/642 be changed to svm_ptr instead of _ptr_as_int?:

https://github.com/inducer/loopy/blob/7ff7b90fc3fff5b049fca36c410e93023cad57cd/loopy/target/pyopencl.py#L816

svm_ptr would also need to be exported.

Edit: Should probably be int_ptr.

Aug 05 '22 22:08 matthiasdiener

Correct, it should be int_ptr. I have pushed a fix there.

Aug 05 '22 23:08 inducer

The latest changes fix a double-free situation that led to crashes during lazy exec, specifically these here: https://github.com/inducer/pyopencl/compare/87420b1334806b7fa0fb0104e80a09366869b9f0..ae4e76fd09103f9d99f13065eb5599c98190b861

Aug 07 '22 23:08 inducer

[ ] Ensure that SVM allocators can't be passed to buffer mem pool and vice versa.

As far as I can see, with the current version of this PR, this already works as intended. E.g.:

alloc = SVMAllocator(ctx, alignment=0, queue=queue)
alloc = MemoryPool(alloc)

results in:

 File "/Users/mdiener/Work/svmfuse/pyopencl/examples/demo_array_svm.py", line 16, in <module>
    alloc = MemoryPool(alloc)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. pyopencl._cl._tools_MemoryPool(allocator: pyopencl._cl._tools_AllocatorBase, leading_bits_in_bin_id: int = 4)

Invoked with: <pyopencl._cl._tools_SVMAllocator object at 0x10e34b6f0>

There is a similar error for passing an Immediate/Deferred allocator to an SVM Pool.

Aug 15 '22 14:08 matthiasdiener

The documentation end of this is now in decent shape, I think. But tests are still missing.

Aug 29 '22 05:08 inducer

@matthiasdiener Thanks for taking a look!

Aug 31 '22 00:08 inducer

I think this is essentially done now. I'll give it another read after dinner and then it'll likely go in.

Sep 06 '22 01:09 inducer

This gitlab CI run crashed here:

https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166

when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?

It seems that this can be reliably provoked with

PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.py

on the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.

Sep 06 '22 06:09 inducer

This gitlab CI run crashed here:

https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166

when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?

It seems that this can be reliably provoked with
PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.py  
on the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.

Is there a way to see which version of pocl is used for the test?

Edit: I was able to reproduce the error with the most recent version of pocl/pocl#1067 on koelsch.

Sep 06 '22 18:09 matthiasdiener

This gitlab CI run crashed here:

https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166

when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?

It seems that this can be reliably provoked with
PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.py  
on the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.

From what I can see, the issue looks similar to what we discussed before for the SVM memcpy errors - there seems to be something weird with the unified memory on the K40: managed memory can not be accessed/modified on the host (like in the fill operation at issue here) in some circumstances, but I have not been able to find out why.

Edit: I think this issue shouldn't block this PR, since it is entirely due to pocl-cuda.

Sep 07 '22 02:09 matthiasdiener

OK, I've done my final review, made some final fixes. If it passes CI here and on Gitlab, then it'll go in. :tada:

Sep 08 '22 00:09 inducer

I'll call that passing. :slightly_smiling_face:

Sep 08 '22 01:09 inducer

https://pypi.org/project/pyopencl/2022.2/

Sep 08 '22 01:09 inducer

pyopencl pyopencl copied to clipboard

Make SVM practical

pyopencl
pyopencl copied to clipboard