pyopencl
pyopencl copied to clipboard
Deadlocks when accessing Context with active GL interop
I'm having trouble porting my array-based code to an interop-based renderer.
I'm instantating the array like this: using an allocator:
def gl_buffer_allocator(size):
ubo = glGenBuffers(1)
glBindBuffer(GL_UNIFORM_BUFFER, ubo)
glBufferStorage(GL_UNIFORM_BUFFER, size, None, GL_MAP_READ_BIT | GL_MAP_WRITE_BIT)
glBindBuffer(GL_UNIFORM_BUFFER, 0)
return GLBuffer(ctx, mem_flags.READ_WRITE, int(ubo))
to_device cannot work because it doesn't acquire the GLBuffer, and I cannot do that beforehand since the buffer isn't allocated yet.
It works like this:
self.grid = Array(queue, self.grid_array.shape, self.grid_array.dtype, allocator=allocator)
self.grid.queue = None # didn't want to associate a queue yet
self.grid.allocator = None # make sure `.get()` doesn't allocate GLBuffers
for some reason passing a Context instead of a CommandQueue makes this lock up here. Freeing the context seems like the wrong thing to do...?
#0 0x00007fffeb5749aa in ?? () from /usr/lib/libnvidia-glcore.so.396.24
#1 0x00007fffeb1b2190 in ?? () from /usr/lib/libnvidia-glcore.so.396.24
#2 0x00007fffeb4c9f32 in ?? () from /usr/lib/libnvidia-glcore.so.396.24
#3 0x00007fffec70b6a3 in glcuR0d4nX () from /usr/lib/libGLX_nvidia.so.0
#4 0x00007fffe8853544 in ?? () from /usr/lib/libnvidia-opencl.so.1
#5 0x00007fffe8752881 in ?? () from /usr/lib/libnvidia-opencl.so.1
#6 0x00007fffe8751595 in ?? () from /usr/lib/libnvidia-opencl.so.1
#7 0x00007ffff032eb94 in clReleaseContext () from /usr/lib/libOpenCL.so.1
#8 0x00007ffff0572d95 in context::~context() () from /usr/lib/python3.6/site-packages/pyopencl/_cffi.abi3.so
#9 0x00007ffff057303a in context::~context() () from /usr/lib/python3.6/site-packages/pyopencl/_cffi.abi3.so
#10 0x00007ffff055f9fd in ?? () from /usr/lib/python3.6/site-packages/pyopencl/_cffi.abi3.so
The same deadlock is preventing me from using grid.with_queue(), grid.setitem() etc.
I realized later that I can trigger it just by accessing the context attribute of a CommandQueue:
def step(self):
with CommandQueue(self.ctx) as queue:
cl.enqueue_acquire_gl_objects(queue, [self.grid.base_data])
# uncomment to lock
# queue.context
self.grid.set(self.grid_array, queue=queue)
cl.enqueue_acquire_gl_objects(queue, [self.grid.base_data])
print('got here')
interestingly it runs until 'got here' but I never see the result of the set call. The step() method also never returns for me.
If I debug the script in pudb, the interface closes as I step out of the method.
I'll see if i can create a small reproducable example now.
here we go:
from OpenGL.GL import *
from OpenGL.GLUT import *
import pyopencl as cl
import pyopencl.array
import numpy as np
def get_ctx():
from pyopencl.tools import get_gl_sharing_context_properties
import sys
platform = cl.get_platforms()[0]
if sys.platform == "darwin":
return cl.Context(properties=get_gl_sharing_context_properties(),
devices=[])
else:
# Some OSs prefer clCreateContextFromType, some prefer
# clCreateContext. Try both.
try:
return cl.Context(properties=[
(cl.context_properties.PLATFORM, platform)]
+ get_gl_sharing_context_properties())
except:
return cl.Context(properties=[
(cl.context_properties.PLATFORM, platform)]
+ get_gl_sharing_context_properties(),
devices = [platform.get_devices()[0]])
glutInit()
def gl_allocator(size):
ubo = glGenBuffers(1)
glBindBuffer(GL_UNIFORM_BUFFER, ubo)
glBufferStorage(GL_UNIFORM_BUFFER, size, None, GL_MAP_READ_BIT | GL_MAP_WRITE_BIT)
glBindBuffer(GL_UNIFORM_BUFFER, 0)
return cl.GLBuffer(ctx, cl.mem_flags.READ_WRITE, int(ubo))
glutInit()
glutInitWindowSize(512, 512)
glutCreateWindow('gpWFC')
glutDisplayFunc(lambda: 0)
ctx = get_ctx()
data = np.arange(100)
with cl.CommandQueue(ctx) as queue:
arr = cl.array.Array(queue, data.shape, data.dtype, allocator=gl_allocator)
def key(*args):
print("key pressed")
with cl.CommandQueue(ctx) as queue:
cl.enqueue_acquire_gl_objects(queue, [arr.base_data])
queue.context
arr.set(data, queue=queue)
cl.enqueue_release_gl_objects(queue, [arr.base_data])
glutKeyboardFunc(key)
glutMainLoop()
let this open, press any key once and it should lock up. My system info is in this comment.
It works like this:
I'd discourage in-place modification of Array instances. Instead, simply pass your buffer to the constructor via the data= kwarg.
Freeing the context seems like the wrong thing to do...?
That's weird. OpenCL is reference counted, so all clReleaseContext should do is decrease the refcount--unless that was indeed the last reference to the context.
@inducer I tried that but it also triggered the hang. Maybe clReleaseContext is only decreasing the reference and there is something else going on - I assumed whats in the title from the backtrace only.
If you don't have time to look into this, could you recommend a debugging strategy?
EDIT: leaving this link here for reference, I'll check my dmesg output next time and also see if I can get a test setup on Windows.
@inducer have you had a chance to take a look at the minimal example I provided above?
I have not, sorry. But you may want to retry with git master, which is a whole different code base (actually, mostly a revival of the old Boost.Python code on top of pybind11).
I have not, sorry. But you may want to retry with git master, which is a whole different code base (actually, mostly a revival of the old Boost.Python code on top of pybind11).
Great, I've given it a shot but I am experiencing some issues with NVIDIA Optimus / Bumblebee on my laptop: Bumblebee-Project/Bumblebee#778
Xlib: extension "NV-GLX" missing on display ":0"
Having dealt with these things in the past though, I think the fix is just waiting until I get back to my desktop PC where optimus doesn't get in the way.
Unfortunately still experiencing the same problem:
(gdb) bt
#0 0x00007f98d5c76853 in () at /usr/lib/libnvidia-glcore.so.415.25
#1 0x00007f98b0999478 in ()
#2 0x00007ffd5f107d58 in ()
#3 0x00007ffd5f107d58 in ()
#4 0x000055e2a22f3870 in ()
#5 0x00007f98d6cf0ebd in () at /usr/lib/libGLX_nvidia.so.0
#6 0x00007f98d5c3901d in () at /usr/lib/libnvidia-glcore.so.415.25
#7 0x00007f98d5bf0d02 in () at /usr/lib/libnvidia-glcore.so.415.25
#8 0x00007f98d6cb8033 in glcuR0d4nX () at /usr/lib/libGLX_nvidia.so.0
#9 0x00007f98d2e1d794 in () at /usr/lib/libnvidia-opencl.so.1
#10 0x00007f98d2d1b7d1 in () at /usr/lib/libnvidia-opencl.so.1
#11 0x00007f98d2d1a4e5 in () at /usr/lib/libnvidia-opencl.so.1
#12 0x00007f98d92deef4 in clReleaseContext () at /usr/lib/libOpenCL.so.1
#13 0x00007f98d8cdad8b in std::_Sp_counted_ptr<pyopencl::context*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
at /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#14 0x00007f98d8cda6a4 in pybind11::class_<pyopencl::context, std::shared_ptr<pyopencl::context> >::dealloc(pybind11::detail::value_and_holder&) ()
at /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#15 0x00007f98d8cce01f in pybind11_object_dealloc ()
at /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#16 0x00007f98e4e5dd9e in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.7m.so.1.0
#17 0x00007f98e4d9eb99 in _PyEval_EvalCodeWithName () at /usr/lib/libpython3.7m.so.1.0
#18 0x00007f98e4de5492 in _PyFunction_FastCallKeywords () at /usr/lib/libpython3.7m.so.1.0
#19 0x00007f98e4e57c42 in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.7m.so.1.0
#20 0x00007f98e4d9eb99 in _PyEval_EvalCodeWithName () at /usr/lib/libpython3.7m.so.1.0
#21 0x00007f98e4d9fdec in _PyFunction_FastCallDict () at /usr/lib/libpython3.7m.so.1.0
#22 0x00007f98e4e5943c in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.7m.so.1.0
#23 0x00007f98e4d9eb99 in _PyEval_EvalCodeWithName () at /usr/lib/libpython3.7m.so.1.0
#24 0x00007f98e4de5492 in _PyFunction_FastCallKeywords () at /usr/lib/libpython3.7m.so.1.0
#25 0x00007f98e4e58b7d in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.7m.so.1.0
#26 0x00007f98e4d9fc0b in _PyFunction_FastCallDict () at /usr/lib/libpython3.7m.so.1.0
#27 0x00007f98e4e5943c in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.7m.so.1.0
#28 0x00007f98e4d9eb99 in _PyEval_EvalCodeWithName () at /usr/lib/libpython3.7m.so.1.0
#29 0x00007f98e4de5492 in _PyFunction_FastCallKeywords () at /usr/lib/libpython3.7m.so.1.0
--Type <RET> for more, q to quit, c to continue without paging--
This is my own code, but interestingly enough I now have the same problem running examples/gl_interop_demo.py:
(gdb) bt
#0 0x00007fffe8b11896 in ?? () from /usr/lib/libnvidia-glcore.so.415.25
#1 0x00007fffe8b3e5fc in ?? () from /usr/lib/libnvidia-glcore.so.415.25
#2 0x00007fffe87657b0 in ?? () from /usr/lib/libnvidia-glcore.so.415.25
#3 0x00007fffe8a8bd02 in ?? () from /usr/lib/libnvidia-glcore.so.415.25
#4 0x00007fffe9b53033 in glcuR0d4nX () from /usr/lib/libGLX_nvidia.so.0
#5 0x00007fffe5cd8794 in ?? () from /usr/lib/libnvidia-opencl.so.1
#6 0x00007fffe5bd67d1 in ?? () from /usr/lib/libnvidia-opencl.so.1
#7 0x00007fffe5bd54e5 in ?? () from /usr/lib/libnvidia-opencl.so.1
#8 0x00007ffff7203ef4 in clReleaseContext () from /usr/lib/libOpenCL.so.1
#9 0x00007ffff5411d8b in std::_Sp_counted_ptr<pyopencl::context*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
from /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#10 0x00007ffff54116a4 in pybind11::class_<pyopencl::context, std::shared_ptr<pyopencl::context> >::dealloc(pybind11::detail::value_and_holder&) ()
from /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#11 0x00007ffff540501f in pybind11_object_dealloc ()
from /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#12 0x00007ffff7b664c0 in _PyFunction_FastCallKeywords () from /usr/lib/libpython3.7m.so.1.0
#13 0x00007ffff7bd8dfa in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.7m.so.1.0
#14 0x00007ffff7b1fb99 in _PyEval_EvalCodeWithName () from /usr/lib/libpython3.7m.so.1.0
#15 0x00007ffff7b20ab4 in PyEval_EvalCodeEx () from /usr/lib/libpython3.7m.so.1.0
#16 0x00007ffff7b20adc in PyEval_EvalCode () from /usr/lib/libpython3.7m.so.1.0
#17 0x00007ffff7c4ac94 in ?? () from /usr/lib/libpython3.7m.so.1.0
#18 0x00007ffff7c4c8be in PyRun_FileExFlags () from /usr/lib/libpython3.7m.so.1.0
#19 0x00007ffff7c4dc75 in PyRun_SimpleFileExFlags () from /usr/lib/libpython3.7m.so.1.0
#20 0x00007ffff7c4feb7 in ?? () from /usr/lib/libpython3.7m.so.1.0
#21 0x00007ffff7c500fc in _Py_UnixMain () from /usr/lib/libpython3.7m.so.1.0
#22 0x00007ffff7dae223 in __libc_start_main () from /usr/lib/libc.so.6
#23 0x000055555555505e in _start ()
However examples/gl_particle_animation.py works fine...
What are the differences in the context setup code between examples/gl_interop_demo.py and examples/gl_particle_animation.py? What happens if you graft the context setup code from one onto the other?
in examples/gl_particle_animation.py the context is created simply by
platform = cl.get_platforms()[0]
ctx = cl.Context(properties=[(cl.context_properties.PLATFORM, platform)] + get_gl_sharing_context_properties())
while in examples/gl_interop_demo.py there is this a bit more elaborate block:
platform = cl.get_platforms()[0]
from pyopencl.tools import get_gl_sharing_context_properties
import sys
if sys.platform == "darwin":
ctx = cl.Context(properties=get_gl_sharing_context_properties(),
devices=[])
else:
# Some OSs prefer clCreateContextFromType, some prefer
# clCreateContext. Try both.
try:
ctx = cl.Context(properties=[
(cl.context_properties.PLATFORM, platform)]
+ get_gl_sharing_context_properties())
except:
ctx = cl.Context(properties=[
(cl.context_properties.PLATFORM, platform)]
+ get_gl_sharing_context_properties(),
devices = [platform.get_devices()[0]])
replacing the second with the first doesn't change the outcome though:
(gdb) bt
#0 0x00007fffe8b3e5f2 in ?? () from /usr/lib/libnvidia-glcore.so.415.25
#1 0x00007fffe87657b0 in ?? () from /usr/lib/libnvidia-glcore.so.415.25
#2 0x00007fffe8a8bd02 in ?? () from /usr/lib/libnvidia-glcore.so.415.25
#3 0x00007fffe9b53033 in glcuR0d4nX () from /usr/lib/libGLX_nvidia.so.0
#4 0x00007fffe5cd8794 in ?? () from /usr/lib/libnvidia-opencl.so.1
#5 0x00007fffe5bd67d1 in ?? () from /usr/lib/libnvidia-opencl.so.1
#6 0x00007fffe5bd54e5 in ?? () from /usr/lib/libnvidia-opencl.so.1
#7 0x00007ffff7203ef4 in clReleaseContext () from /usr/lib/libOpenCL.so.1
#8 0x00007ffff5411d8b in std::_Sp_counted_ptr<pyopencl::context*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
from /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#9 0x00007ffff54116a4 in pybind11::class_<pyopencl::context, std::shared_ptr<pyopencl::context> >::dealloc(pybind11::detail::value_and_holder&) ()
from /home/s-ol/Documents/other/gpWFC/venv/lib/python3.7/site-packages/pyopencl-2018.2.2-py3.7-linux-x86_64.egg/pyopencl/_cl.cpython-37m-x86_64-linux-gnu.so
#10 0x00007ffff540501f in pybind11_object_dealloc ()
Also I finally managed to load the python GDB utils but it doesn't give any more information (because my python version is not compiled for debugging I assume):
(gdb) thread apply all py-bt-full
Thread 11 (Thread 0x7fffe1554700 (LWP 31451)):
Unable to locate python frame
Thread 10 (Thread 0x7fffe1d55700 (LWP 31450)):
Unable to locate python frame
Thread 9 (Thread 0x7fffe2556700 (LWP 31449)):
Unable to locate python frame
Thread 8 (Thread 0x7fffe2d57700 (LWP 31448)):
Unable to locate python frame
Thread 7 (Thread 0x7fffe3558700 (LWP 31447)):
Unable to locate python frame
Thread 6 (Thread 0x7fffe3f61700 (LWP 31446)):
#0 Waiting for the GIL
Thread 5 (Thread 0x7fffe4762700 (LWP 31445)):
Unable to locate python frame
Thread 4 (Thread 0x7fffedb81700 (LWP 31436)):
Unable to locate python frame
Thread 3 (Thread 0x7ffff2382700 (LWP 31435)):
Unable to locate python frame
Thread 2 (Thread 0x7ffff2b83700 (LWP 31434)):
Unable to locate python frame
Thread 1 (Thread 0x7ffff7883600 (LWP 31416)):
#12 (unable to read python frame information)
So the fact that the backtrace contains clReleaseContext points to the notion that the Nvidia runtime has some bug that makes it not like decreasing the context refcount (perhaps: doing so while GL interop is still active). Something to try would be to make sure you hold on to a handle to the context somewhere, to make sure it doesn't get released prematurely.