pyopencl
pyopencl copied to clipboard
unaligned numpy arrays
For FPGA execution of OpenCL kernels, the board expects 64-byte aligned host arrays. However, there seems to be no way to get numpy arrays to obey custom alignments (It seems the inbuilt ALIGNMENT length is 16 or some such). In most cases, this is harmless, but sometimes it cases the FPGA OpenCL kernel to stall/freeze. I'm wondering if there's some way to get PyOpenCL to align buffers on the host prior to transfer?
The execution is stuck at futex(0x7ff4e623bb08, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff
It seems similar to http://stackoverflow.com/questions/10306669/opencl-kernel-hangs-forever-unless-i-remove-parameters/42214687#42214687 and the thread at https://lists.tiker.net/pipermail/pyopencl/2012-April/001158.html. But no resolution seems to have been posted.
To my mind, this is a mailing list/tech support issue more than a bug in PyOpenCL.
i'd suggest you use code like this to align your numpy arrays to start with. PyOpenCL can't really do much about the alignment of data that already exists (short of copying, which is almost definitely not what anybody wants).
There's also not much of an argument to support that PyOpenCL should do anything about this either, because (e.g.) clEnqueueWriteBuffer
is specified to accept any pointer. An implementation that fails to do so is non-conforming.
Found a simpler solution at http://numpy-discussion.10968.n7.nabble.com/Byte-aligned-arrays-td3887.html which fixes alignment, but is still a bit messy
def aligned_zeros(shape, boundary=64, dtype=float, order='C'):
N = np.prod(shape)
d = np.dtype(dtype)
tmp = np.zeros(N * d.itemsize + boundary, dtype=np.uint8)
address = tmp.__array_interface__['data'][0]
offset = (boundary - address % boundary) % boundary
return tmp[offset:offset+N*d.itemsize].view(dtype=d).reshape(shape, order=order)