pyopencl
pyopencl copied to clipboard
Make SVM practical
Draft because it's missing:
- [ ] Tests
- [ ] Ensure that SVM allocators can't be passed to buffer mem pool and vice versa.
- [ ] A memory pool for SVM
- [ ] Quite subtle: #447
- [ ] Documentation, especially about synchronization of deallocation with pending work.
- [ ] ~~
demo_array_svmcrashes for the moment,~~ the queue-associated SVM allocation is one potential solution for that, at least for in-order queues. See also #449.
With 3abf89c, examples/svm.py is failing with the following message:
File "/shared/home/mdiener/Work/pyopencl/examples/svm.py", line 35, in <module>
svm_ary = cl.SVM(cl.csvm_empty(ctx, 10, np.float32))
File "/shared/home/mdiener/Work/pyopencl/pyopencl/__init__.py", line 2104, in csvm_empty
return svm_empty(ctx, svm_mem_flags.READ_WRITE, shape, dtype, order, alignment)
File "/shared/home/mdiener/Work/pyopencl/pyopencl/__init__.py", line 2062, in svm_empty
svm_alloc = _ArrayInterfaceSVMAllocation(
NameError: name '_ArrayInterfaceSVMAllocation' is not defined
The following patch works around it (it also adds the missing self.__array_interface__):
diff --git a/pyopencl/__init__.py b/pyopencl/__init__.py
index 22562d1..9c62cea 100644
--- a/pyopencl/__init__.py
+++ b/pyopencl/__init__.py
@@ -1141,21 +1141,6 @@ def _add_functionality():
.. automethod:: enqueue_release
"""
- if get_cl_header_version() >= (2, 0):
- class _ArrayInterfaceSVMAllocation(SVMAllocation):
- def __init__(self, ctx, size, alignment, flags, _interface=None,
- queue=None):
- """
- :arg ctx: a :class:`Context`
- :arg flags: some of :class:`svm_mem_flags`.
- """
- super().__init__(ctx, size, alignment, flags, queue)
-
- # mem_flags.READ_ONLY applies to kernels, not the host
- read_write = True
- _interface["data"] = (
- int(self._ptr_as_int()), not read_write)
-
# }}}
# {{{ SVM
@@ -1396,6 +1381,23 @@ def _add_functionality():
_add_functionality()
+if get_cl_header_version() >= (2, 0):
+ class _ArrayInterfaceSVMAllocation(SVMAllocation):
+ def __init__(self, ctx, size, alignment, flags, _interface=None,
+ queue=None):
+ """
+ :arg ctx: a :class:`Context`
+ :arg flags: some of :class:`svm_mem_flags`.
+ """
+ super().__init__(ctx, size, alignment, flags, queue)
+
+ # mem_flags.READ_ONLY applies to kernels, not the host
+ read_write = True
+ _interface["data"] = (
+ int(self._ptr_as_int()), not read_write)
+
+ self.__array_interface__ = _interface
+
# }}}
(Do you prefer me to send these as PRs on top of this PR?)
7f20c759f47278ef14b69cc3d5546993cbb52529 is a WIP commit that doesn't even compile. It's intended to prevent further duplication of work with @matthiasdiener (cf. #587). Sorry about not getting this pushed sooner.
0cf016ff73bc3f7985b7ca227c4a8194e847ba0e has broken compatibility with examples/svm.py. I'll look into it.
Attemping to run svm.py on a Crusher MI250X with Rocm 5.2.0 results in a segmentation fault.
Device 'gfx90a:sramecc-:xnack-' on platform 'AMD Accelerated Parallel Processing (OpenCL 2.1 AMD-APP (3452.0))' has the following SVM features:
Coarse-grained buffer SVM: True
Fine-grained buffer SVM: True
Fine-grained system SVM: False
/autofs/nccs-svm1_home1/njchris/Workspace/pyopencl/pyopencl/__init__.py:272: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
warn("Non-empty compiler output encountered. Set the "
Segmentation fault
@nchristensen
Segmentation fault
Are you able to get a backtrace (e.g. with gdb?)
It appears that a good part of the slower execution of SVM comes from the ordering of the argument type checks in Kernel.set_arg:
https://github.com/inducer/pyopencl/blob/8bce4f6cce70bf22baa8fd238784f3557e942b55/src/wrap_cl.hpp#L4504-L4518
We check for memory objects (images, buffers) first, and for SVM second. FWIW, reversing that order seems to help a lot, and the exception handling appears to play a big part in the time spent. Unfortunately, according to https://github.com/pybind/pybind11/issues/649, the try/except solution appears to be the only way to see whether a cast will succeed in pybind.
After pulling in the lastest changes and recompiling I am no longer seeing segmentation faults on Rocm 5.2.0 for svm.py or demo_array_svm.py.
With the changes in https://github.com/inducer/pyopencl/compare/7ef5ce5ec280038559295d45986c96302c193d6d..87420b1334806b7fa0fb0104e80a09366869b9f0, this performs about equivalently to buffers (with pocl) in my unscientific benchmarking.
Should the loopy interface in https://github.com/inducer/loopy/pull/642 be changed to svm_ptr instead of _ptr_as_int?:
https://github.com/inducer/loopy/blob/7ff7b90fc3fff5b049fca36c410e93023cad57cd/loopy/target/pyopencl.py#L816
svm_ptr would also need to be exported.
Edit:
Should probably be int_ptr.
Correct, it should be int_ptr. I have pushed a fix there.
The latest changes fix a double-free situation that led to crashes during lazy exec, specifically these here: https://github.com/inducer/pyopencl/compare/87420b1334806b7fa0fb0104e80a09366869b9f0..ae4e76fd09103f9d99f13065eb5599c98190b861
- [ ] Ensure that SVM allocators can't be passed to buffer mem pool and vice versa.
As far as I can see, with the current version of this PR, this already works as intended. E.g.:
alloc = SVMAllocator(ctx, alignment=0, queue=queue)
alloc = MemoryPool(alloc)
results in:
File "/Users/mdiener/Work/svmfuse/pyopencl/examples/demo_array_svm.py", line 16, in <module>
alloc = MemoryPool(alloc)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
1. pyopencl._cl._tools_MemoryPool(allocator: pyopencl._cl._tools_AllocatorBase, leading_bits_in_bin_id: int = 4)
Invoked with: <pyopencl._cl._tools_SVMAllocator object at 0x10e34b6f0>
There is a similar error for passing an Immediate/Deferred allocator to an SVM Pool.
The documentation end of this is now in decent shape, I think. But tests are still missing.
@matthiasdiener Thanks for taking a look!
I think this is essentially done now. I'll give it another read after dinner and then it'll likely go in.
This gitlab CI run crashed here:
https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166
when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?
It seems that this can be reliably provoked with
PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.py
on the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.
This gitlab CI run crashed here:
https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166
when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?
It seems that this can be reliably provoked with
PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.pyon the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.
Is there a way to see which version of pocl is used for the test?
Edit: I was able to reproduce the error with the most recent version of pocl/pocl#1067 on koelsch.
This gitlab CI run crashed here:
https://github.com/inducer/pyopencl/blob/d15d7d1cea8868d2f7cd10f94178181d452a6790/test/test_wrapper.py#L1166
when running with POCL GPU, with this backtrace. @matthiasdiener, any thoughts on what might be the issue?
It seems that this can be reliably provoked with
PYOPENCL_TEST=port:k40 python -m pytest --sw --tb=native test_array.py test_wrapper.pyon the K40 (at least on Tripel and koelsch). It seems to pass OK on the Titan V. I've xfailed the test for now on the K40.
From what I can see, the issue looks similar to what we discussed before for the SVM memcpy errors - there seems to be something weird with the unified memory on the K40: managed memory can not be accessed/modified on the host (like in the fill operation at issue here) in some circumstances, but I have not been able to find out why.
Edit: I think this issue shouldn't block this PR, since it is entirely due to pocl-cuda.
OK, I've done my final review, made some final fixes. If it passes CI here and on Gitlab, then it'll go in. :tada:
I'll call that passing. :slightly_smiling_face:
https://pypi.org/project/pyopencl/2022.2/