pyopencl icon indicating copy to clipboard operation
pyopencl copied to clipboard

Make SVM practical

Open inducer opened this issue 4 years ago • 11 comments

Draft because it's missing:

  • [ ] Tests
    • [ ] Ensure that SVM allocators can't be passed to buffer mem pool and vice versa.
  • [ ] A memory pool for SVM
    • [ ] Quite subtle: #447
  • [ ] Documentation, especially about synchronization of deallocation with pending work.
  • [ ] ~~demo_array_svm crashes for the moment,~~ the queue-associated SVM allocation is one potential solution for that, at least for in-order queues. See also #449.

inducer avatar Mar 29 '21 05:03 inducer

With 3abf89c, examples/svm.py is failing with the following message:

  File "/shared/home/mdiener/Work/pyopencl/examples/svm.py", line 35, in <module>
    svm_ary = cl.SVM(cl.csvm_empty(ctx, 10, np.float32))
  File "/shared/home/mdiener/Work/pyopencl/pyopencl/__init__.py", line 2104, in csvm_empty
    return svm_empty(ctx, svm_mem_flags.READ_WRITE, shape, dtype, order, alignment)
  File "/shared/home/mdiener/Work/pyopencl/pyopencl/__init__.py", line 2062, in svm_empty
    svm_alloc = _ArrayInterfaceSVMAllocation(
NameError: name '_ArrayInterfaceSVMAllocation' is not defined

The following patch works around it (it also adds the missing self.__array_interface__):

diff --git a/pyopencl/__init__.py b/pyopencl/__init__.py
index 22562d1..9c62cea 100644
--- a/pyopencl/__init__.py
+++ b/pyopencl/__init__.py
@@ -1141,21 +1141,6 @@ def _add_functionality():
             .. automethod:: enqueue_release
             """

-    if get_cl_header_version() >= (2, 0):
-        class _ArrayInterfaceSVMAllocation(SVMAllocation):
-            def __init__(self, ctx, size, alignment, flags, _interface=None,
-                    queue=None):
-                """
-                :arg ctx: a :class:`Context`
-                :arg flags: some of :class:`svm_mem_flags`.
-                """
-                super().__init__(ctx, size, alignment, flags, queue)
-
-                # mem_flags.READ_ONLY applies to kernels, not the host
-                read_write = True
-                _interface["data"] = (
-                        int(self._ptr_as_int()), not read_write)
-
     # }}}

     # {{{ SVM
@@ -1396,6 +1381,23 @@ def _add_functionality():

 _add_functionality()

+if get_cl_header_version() >= (2, 0):
+    class _ArrayInterfaceSVMAllocation(SVMAllocation):
+        def __init__(self, ctx, size, alignment, flags, _interface=None,
+                queue=None):
+            """
+            :arg ctx: a :class:`Context`
+            :arg flags: some of :class:`svm_mem_flags`.
+            """
+            super().__init__(ctx, size, alignment, flags, queue)
+
+            # mem_flags.READ_ONLY applies to kernels, not the host
+            read_write = True
+            _interface["data"] = (
+                    int(self._ptr_as_int()), not read_write)
+
+            self.__array_interface__ = _interface
+
 # }}}

(Do you prefer me to send these as PRs on top of this PR?)

matthiasdiener avatar Jul 13 '22 16:07 matthiasdiener

7f20c759f47278ef14b69cc3d5546993cbb52529 is a WIP commit that doesn't even compile. It's intended to prevent further duplication of work with @matthiasdiener (cf. #587). Sorry about not getting this pushed sooner.

inducer avatar Jul 22 '22 14:07 inducer

0cf016ff73bc3f7985b7ca227c4a8194e847ba0e has broken compatibility with examples/svm.py. I'll look into it.

inducer avatar Jul 29 '22 00:07 inducer

Attemping to run svm.py on a Crusher MI250X with Rocm 5.2.0 results in a segmentation fault.

Device 'gfx90a:sramecc-:xnack-' on platform 'AMD Accelerated Parallel Processing (OpenCL 2.1 AMD-APP (3452.0))' has the following SVM features:
  Coarse-grained buffer SVM: True
  Fine-grained buffer SVM:   True
  Fine-grained system SVM:   False
/autofs/nccs-svm1_home1/njchris/Workspace/pyopencl/pyopencl/__init__.py:272: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  warn("Non-empty compiler output encountered. Set the "
Segmentation fault

nchristensen avatar Aug 02 '22 04:08 nchristensen

@nchristensen

Segmentation fault

Are you able to get a backtrace (e.g. with gdb?)

inducer avatar Aug 02 '22 22:08 inducer

It appears that a good part of the slower execution of SVM comes from the ordering of the argument type checks in Kernel.set_arg:

https://github.com/inducer/pyopencl/blob/8bce4f6cce70bf22baa8fd238784f3557e942b55/src/wrap_cl.hpp#L4504-L4518

We check for memory objects (images, buffers) first, and for SVM second. FWIW, reversing that order seems to help a lot, and the exception handling appears to play a big part in the time spent. Unfortunately, according to https://github.com/pybind/pybind11/issues/649, the try/except solution appears to be the only way to see whether a cast will succeed in pybind.

inducer avatar Aug 03 '22 00:08 inducer

After pulling in the lastest changes and recompiling I am no longer seeing segmentation faults on Rocm 5.2.0 for svm.py or demo_array_svm.py.

nchristensen avatar Aug 04 '22 19:08 nchristensen

With the changes in https://github.com/inducer/pyopencl/compare/7ef5ce5ec280038559295d45986c96302c193d6d..87420b1334806b7fa0fb0104e80a09366869b9f0, this performs about equivalently to buffers (with pocl) in my unscientific benchmarking.

inducer avatar Aug 05 '22 21:08 inducer

Should the loopy interface in https://github.com/inducer/loopy/pull/642 be changed to svm_ptr instead of _ptr_as_int?:

https://github.com/inducer/loopy/blob/7ff7b90fc3fff5b049fca36c410e93023cad57cd/loopy/target/pyopencl.py#L816

svm_ptr would also need to be exported.

Edit: Should probably be int_ptr.

matthiasdiener avatar Aug 05 '22 22:08 matthiasdiener

Correct, it should be int_ptr. I have pushed a fix there.

inducer avatar Aug 05 '22 23:08 inducer

The latest changes fix a double-free situation that led to crashes during lazy exec, specifically these here: https://github.com/inducer/pyopencl/compare/87420b1334806b7fa0fb0104e80a09366869b9f0..ae4e76fd09103f9d99f13065eb5599c98190b861

inducer avatar Aug 07 '22 23:08 inducer

  • [ ] Ensure that SVM allocators can't be passed to buffer mem pool and vice versa.

As far as I can see, with the current version of this PR, this already works as intended. E.g.:

alloc = SVMAllocator(ctx, alignment=0, queue=queue)
alloc = MemoryPool(alloc)

results in:

 File "/Users/mdiener/Work/svmfuse/pyopencl/examples/demo_array_svm.py", line 16, in <module>
    alloc = MemoryPool(alloc)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. pyopencl._cl._tools_MemoryPool(allocator: pyopencl._cl._tools_AllocatorBase, leading_bits_in_bin_id: int = 4)

Invoked with: <pyopencl._cl._tools_SVMAllocator object at 0x10e34b6f0>

There is a similar error for passing an Immediate/Deferred allocator to an SVM Pool.

matthiasdiener avatar Aug 15 '22 14:08 matthiasdiener

The documentation end of this is now in decent shape, I think. But tests are still missing.

inducer avatar Aug 29 '22 05:08 inducer

@matthiasdiener Thanks for taking a look!

inducer avatar Aug 31 '22 00:08 inducer

I think this is essentially done now. I'll give it another read after dinner and then it'll likely go in.

inducer avatar Sep 06 '22 01:09 inducer

This gitlab CI run crashed here:

https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166

when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?

It seems that this can be reliably provoked with

PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.py  

on the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.

inducer avatar Sep 06 '22 06:09 inducer

This gitlab CI run crashed here:

https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166

when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?

It seems that this can be reliably provoked with

PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.py  

on the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.

Is there a way to see which version of pocl is used for the test?

Edit: I was able to reproduce the error with the most recent version of pocl/pocl#1067 on koelsch.

matthiasdiener avatar Sep 06 '22 18:09 matthiasdiener

This gitlab CI run crashed here:

https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166

when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?

It seems that this can be reliably provoked with

PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.py  

on the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.

From what I can see, the issue looks similar to what we discussed before for the SVM memcpy errors - there seems to be something weird with the unified memory on the K40: managed memory can not be accessed/modified on the host (like in the fill operation at issue here) in some circumstances, but I have not been able to find out why.

Edit: I think this issue shouldn't block this PR, since it is entirely due to pocl-cuda.

matthiasdiener avatar Sep 07 '22 02:09 matthiasdiener

OK, I've done my final review, made some final fixes. If it passes CI here and on Gitlab, then it'll go in. :tada:

inducer avatar Sep 08 '22 00:09 inducer

I'll call that passing. :slightly_smiling_face:

inducer avatar Sep 08 '22 01:09 inducer

https://pypi.org/project/pyopencl/2022.2/

inducer avatar Sep 08 '22 01:09 inducer