pyzmq icon indicating copy to clipboard operation
pyzmq copied to clipboard

PyZMQ polls forever on garbage collection at exit

Open tcwalther opened this issue 8 years ago • 1 comments

I haven't been able to write a deterministic script that reproduces the error, but I continuously run into problems terminating scripts after a timeout exception in asyncio pyzmq. When a timeout occurs, my script raises an exception, which terminates the event loop, and then terminates the entire script. However, it hangs at the garbage collection process when exiting the program. I get the following output:

Exception ignored in: <bound method Socket.__del__ of <zmq.asyncio.Socket object at 0x7ff536edd468>>
Traceback (most recent call last):
  File "/opt/conda/envs/sonalytic/lib/python3.5/site-packages/zmq/sugar/socket.py", line 70, in __del__
  File "/opt/conda/envs/sonalytic/lib/python3.5/site-packages/zmq/eventloop/future.py", line 157, in close
TypeError: 'NoneType' object is not callable

Attaching gdb to the process, I get the following:

(gdb) t a a py-bt

Thread 3 (Thread 0x7ff5366d4700 (LWP 1393)):
Traceback (most recent call first):

Thread 2 (Thread 0x7ff536ed5700 (LWP 1392)):
Traceback (most recent call first):

Thread 1 (Thread 0x7ff563d30740 (LWP 1388)):
Traceback (most recent call first):
  Garbage-collecting
(gdb) bt
#0  0x00007ff5627ea84d in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ff5599f90da in zmq::signaler_t::wait (this=this@entry=0x2a064c8, timeout_=timeout_@entry=-1) at src/signaler.cpp:218
#2  0x00007ff5599e4ed0 in zmq::mailbox_t::recv (this=this@entry=0x2a06468, cmd_=cmd_@entry=0x7ffc879dea40, timeout_=timeout_@entry=-1) at src/mailbox.cpp:80
#3  0x00007ff5599d65bc in zmq::ctx_t::terminate (this=0x2a063d0) at src/ctx.cpp:165
#4  0x00007ff55935c259 in __pyx_f_3zmq_7backend_6cython_7context_7Context__term (__pyx_v_self=0x7ff53776d588) at zmq/backend/cython/context.c:2201
#5  __pyx_pf_3zmq_7backend_6cython_7context_7Context_4__dealloc__ (__pyx_v_self=0x7ff53776d588) at zmq/backend/cython/context.c:1786
#6  __pyx_pw_3zmq_7backend_6cython_7context_7Context_5__dealloc__ (__pyx_v_self=<Context at remote 0x7ff53776d588>) at zmq/backend/cython/context.c:1712
#7  __pyx_tp_dealloc_3zmq_7backend_6cython_7context_Context (o=<Context at remote 0x7ff53776d588>) at zmq/backend/cython/context.c:3677
#8  0x00007ff5636cabdc in subtype_dealloc (self=<Context at remote 0x7ff53776d588>) at Objects/typeobject.c:1209
#9  0x00007ff5636ab46f in free_keys_object (keys=<optimized out>) at Objects/dictobject.c:351
#10 PyDict_Clear (op=<unknown at remote 0x7ffc879de9e0>) at Objects/dictobject.c:1388
#11 0x00007ff5636ce726 in type_clear (type=0x1523d18) at Objects/typeobject.c:3289
#12 0x00007ff5637a24e2 in delete_garbage (old=<optimized out>, collectable=<optimized out>) at Modules/gcmodule.c:866
#13 collect (generation=generation@entry=2, n_collected=n_collected@entry=0x0, n_uncollectable=n_uncollectable@entry=0x0, nofail=nofail@entry=1) at Modules/gcmodule.c:1014
#14 0x00007ff5637a3141 in _PyGC_CollectNoFail () at Modules/gcmodule.c:1605
#15 0x00007ff56376d576 in PyImport_Cleanup () at Python/import.c:481
#16 0x00007ff56377f855 in Py_Finalize () at Python/pylifecycle.c:576
#17 0x00007ff5637a0f13 in Py_Main (argc=argc@entry=2, argv=argv@entry=0xbdc010) at Modules/main.c:788
#18 0x0000000000400b54 in main (argc=2, argv=<optimized out>) at ./Programs/python.c:65

This is using pyzmq 16.0.2 and ZeroMQ 4.1.5 or ZeroMQ 4.2.1.

It looks as if ZMQ_LINGER=-1, and thus the program doesn't terminate. However, every single socket I create, I create with socket.setsockopt(zmq.LINGER, 0), so this is a bit unexpected. I also set ZMQ_BLOCKY to False on every context I create (ZeroMQ 4.2.1 only, as this is a new option), which still doesn't resolve the issue.

I don't know whether this error is specific to pyzmq's asyncio module, or is just a basic problem with cleaning up a context in pyzmq. I'd be extremely grateful for any hints on how to further debug this issue.

tcwalther avatar Apr 04 '17 11:04 tcwalther

In the end, I worked around this problem by using a patched context. This context might be useful for other asyncio users. Note that my patched context is not thread-safe, unlike the original context. In my case, this is okay, since concurrency is achieved via the asyncio event-loop.

class PatchedZmqAsyncioContext(zmq.asyncio.Context):
    """
    This patched zmq.asyncio.Context calls `destroy(linger=0)` during garbage collection instead of
    `term()`. This is a workaround for the weird hanging bug described in
    https://github.com/zeromq/pyzmq/issues/1003 and https://github.com/zeromq/libzmq/issues/2586

    .. note: This patched context is not thread-safe!
    """
    def __del__(self):
        """
        Instead of calling terminate as in the original zmq.asyncio.Context, we call destroy.
        This is not thread-safe!
        """
        if not self._shadow and not zmq.sugar.context._exiting:
            self.destroy(linger=0)

    def __exit__(self, *args, **kwargs):
        """
        Instead of calling terminate as in the original zmq.asyncio.Context, we call destroy.
        This is not thread-safe!
        """
        self.destroy(linger=0)

tcwalther avatar Aug 21 '17 09:08 tcwalther