`register_host_memory` fails for numpy array created `frombuffer`
Bug description
Cannot pycuda.driver.register_host_memory using numpy array that was created frombuffer.
Error thrown
ValueError: Cannot set the NumPy array 'base' dependency more than once
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/path/to/my_test_file/./test_pycuda.py", line 12, in <module>
pycuda.driver.register_host_memory(array)
SystemError: <Boost.Python.function object at 0x1b80350> returned a result with an exception set
Browsing other issues raised, this might relate to issue https://github.com/inducer/pycuda/issues/450 and PR #451.
- I see the corresponding unit test was marked
xfailin https://github.com/inducer/pycuda/commit/2a276c4e3373d568363d71460886280d19b260d5
However, this is reproducible with numpy==1.26.2 (not numpy 2.0 or newer).
Steps to reproduce
Executing the following minimum working example with pycuda==2024.1.2 results in the above error. Note that executing the same minimum working example with pycuda==2023.1 does not throw an error.
#!/usr/bin/env python3
import mmap
import numpy as np
import pycuda.autoinit
import pycuda.driver
size = 1024 * 2048
mapping = mmap.mmap(-1, size, flags=mmap.MAP_SHARED)
array = np.frombuffer(mapping, np.uint8)
pycuda.driver.register_host_memory(array)
Expected behaviour
I'd like this to be fixed, please. This is quite key in (our) benchmarking of GPU (htod, dtoh) performance, if/when we do upgrade to the latest pycuda.
Environment
- OS: Ubuntu 22.04.4 LTS
- CUDA version: 12.5
- CUDA driver version: 555.42.06
- PyCUDA version: 2024.1.2
- Python version: 3.12.5
Additional context Belated apologies in advance if I've misinterpreted the error/this is already on your radar. Figured I'd flag it anyway, just to be safe.
I had a closer look here. The problem is that PyArray_FromInterface automatically sets the base to point to the object providing the __array_interface__, and PyArray_SetBaseObject considers it an error if a base is already set.
One way I can see around this is to have registered_host_memory implement __array_interface__ and forward the call to the underlying array, and then pass the registered_host_memory to PyArray_FromInterface.
I just want to note that the posted example works even with pycuda==2024.1.