artiq icon indicating copy to clipboard operation
artiq copied to clipboard

Dashboard occasionally throws "Task exception was never retrieved" errors on Windows

Open cjbe opened this issue 6 years ago • 34 comments

Bug Report

One-Line Summary

The dashboard occasionally throws "Task exception was never retrieved" errors on Windows, even with no user activity.

Issue Details

These errors seem to occur occasionally (~10 per day). They occur even without any user activity. For example, on a dashboard which has some applets open plotting actively changing datasets, but without any user button presses, or any applets being opened or closed.

Steps to Reproduce

No known reproduction

Actual (undesired) Behavior

The traceback is always like this:

quamash.QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=OSError(9, 'The handle is invalid', None, 6, None)>
Traceback (most recent call last):
  File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\asyncio\tasks.py", line 240, in _step
    result = coro.send(None)
  File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
    self.server[0].close()
  File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\asyncio\windows_events.py", line 283, in close
    pipe.close()
  File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\asyncio\windows_utils.py", line 157, in close
    CloseHandle(self._handle)
OSError: [WinError 6] The handle is invalid

Your System

Using Artiq from master

cjbe avatar Dec 12 '18 22:12 cjbe

I haven't seen this. Also on windows.

jordens avatar Dec 13 '18 06:12 jordens

Sometimes after an experiment terminates with an exception I see

asyncio:Task exception was never retrieved
future: <Task finished coro=<DatasetNamespaces.finish_rid.<locals>.gc() done, defined at /home/ion/artiq/artiq/artiq/master/databases.py:125> exception=KeyError('datasets_rid_11065',)>
Traceback (most recent call last):
  File "/home/ion/anaconda3/envs/artiq-main/lib/python3.5/asyncio/tasks.py", line 240, in _step
    result = coro.send(None)
  File "/home/ion/artiq/artiq/artiq/master/databases.py", line 127, in gc
    self._publisher.remove_notifier(_rid_namespace_name(rid))
  File "/home/ion/artiq/artiq/artiq/protocols/sync_struct.py", line 306, in remove_notifier
    notifier = self.notifiers[name]
KeyError: 'datasets_rid_11065

hartytp avatar May 10 '19 10:05 hartytp

Different issue? Windows vs Linux, pipe_rpc vs sync_struct, different errors, different code area, different tools and components.

jordens avatar May 10 '19 10:05 jordens

Yes, it is (and I'll break it out in a second). Just posting here as a reminder that the frequency of the occurance of these events is due to the fact that it has more than one cause...

hartytp avatar May 10 '19 10:05 hartytp

The latter is due to a feature not merged upstream (yet), although I thought I had fixed that particular issue a while back.

dnadlinger avatar May 10 '19 10:05 dnadlinger

I think this is a race condition inside asyncio....

I hacked my local copy of asyncio as follows

class PipeHandle:
    """Wrapper for an overlapped pipe handle which is vaguely file-object like.

    The IOCP event loop can use these instead of socket objects.
    """
    def __init__(self, handle):
        print("new pipe handle: {}".format(handle))
        self._handle = handle
        self._closed = False

    def __repr__(self):
        if self._handle is not None:
            handle = 'handle=%r' % self._handle
        else:
            handle = 'closed'
        return '<%s %s>' % (self.__class__.__name__, handle)

    @property
    def handle(self):
        return self._handle

    def fileno(self):
        if self._handle is None:
            raise ValueError("I/O operatioon on closed pipe")
        return self._handle

    def close(self, *, CloseHandle=_winapi.CloseHandle):
        if self._handle is not None:
            print("close pipe handle: {} {}".format(self._handle, self._closed))
            try:
                print(CloseHandle)
                CloseHandle(self._handle)
            except Exception as e:
                print("wtf {} {} {}".format(e, self._handle, self._closed))
                raise
            print("closed")
            self._closed = True
            self._handle = None

    def __del__(self):
        if self._handle is not None:
            warnings.warn("unclosed %r" % self, ResourceWarning)
            self.close()

    def __enter__(self):
        return self

    def __exit__(self, t, v, tb):
        self.close()

and I see...

opened server pipe 2476
new pipe handle: 2476
new pipe handle: 2484
new pipe handle: 2488
opened server pipe 2536
new pipe handle: 2536
opened server pipe 2544
new pipe handle: 2544
INFO:applet(422 frequency):print:new pipe handle: 1176
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(422 frequency):print:close pipe handle: 1176 False
INFO:applet(422 frequency):print:<built-in function CloseHandle>
INFO:applet(422 frequency):print:closed
close pipe handle: 2484 False
<built-in function CloseHandle>
closed
close pipe handle: 2488 False
<built-in function CloseHandle>
closed
close pipe handle: 2544 False
<built-in function CloseHandle>
close pipe handle: 2536 False
<built-in function CloseHandle>
closed
closed
close pipe handle: 2536 False
<built-in function CloseHandle>
wtf CloseHandle() argument must be int, not None None True
ERROR:dashboard:quamash._QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=TypeError('CloseHandle() argument must be int, not None',)>
Traceback (most recent call last):
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\tasks.py", line 240, in _step
    result = coro.send(None)
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
    self.server[0].close()
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_events.py", line 284, in close
    pipe.close()
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 162, in close
    CloseHandle(self._handle)
T

I guess this is a bug to do with python's lazy evaluation or something like that? It looks like by the time self._handle is captured for the call to CloseHandle it's already been set to None.

@cjbe can you try patching your local asyncio here https://github.com/python/cpython/blob/a2fedd8c910cb5f5b9bd568d6fd44d63f8f5cfa5/Lib/asyncio/windows_utils.py#L105-L108

to be something like:

    def close(self, *, CloseHandle=_winapi.CloseHandle):
        if self._handle is not None:
            handle = self._handle
            self._handle = None
            CloseHandle(handle)

hartytp avatar May 10 '19 12:05 hartytp

@cjbe @dnadlinger can you apply that patch locally and confirm that it fixes the error? If so, I'll close this issue and move the discussion to asyncio.

hartytp avatar May 10 '19 19:05 hartytp

@hartytp: Fixed the RID namespacing-related issue in our local fork.

As for the handle closing issue, doesn't this look more like a race condition between (OS) threads? If that is the case, your patch would only paper over the actual problem. (There shouldn't be any lazy evaluation of that kind in Python.)

dnadlinger avatar May 14 '19 22:05 dnadlinger

Did you stumble over a good way of reproducing this? Repeatedly opening/closing a bunch of applets maybe?

dnadlinger avatar May 14 '19 22:05 dnadlinger

(Maybe it's actually our, i.e. dashboard, code calling close() from different threads after all – an awkwardly straightforward way of debugging this would be to, in close(), save backtraces into a thread-safe global buffer, and fetch/print all of them for a given when the above issue occurs.)

dnadlinger avatar May 14 '19 22:05 dnadlinger

Did you stumble over a good way of reproducing this? Repeatedly opening/closing a bunch of applets maybe?

Yes, that basically does it.

As for the handle closing issue, doesn't this look more like a race condition between (OS) threads? If that is the case, your patch would only paper over the actual problem. (There shouldn't be any lazy evaluation of that kind in Python.)

hmmm...I must have misread the log I posted. The original thing I was looking for was a race where a PipeHandle is closed twice, which is what the log clearly shows is occurring. Somehow I convinced myself that wasn't the case.

So it is a race between threads.

hartytp avatar May 14 '19 22:05 hartytp

Which two threads would that be, though? Or does CloseHandle itself cause some callbacks to be invoked? (Or e.g. the object to be deleted indirectly?) It's been too long since I've had to deal with IOCP…

dnadlinger avatar May 14 '19 22:05 dnadlinger

That's a good question. I haven't figured it out.

AFAICT the handle is only closed by the AsyncioParentComm._auto_close task. https://github.com/m-labs/artiq/blob/84b91ee8bd416bd894497611e4a371f69201042c/artiq/protocols/pipe_ipc.py#L118-L125 That task is created in AsyncioParentComm._auto_close here https://github.com/m-labs/artiq/blob/84b91ee8bd416bd894497611e4a371f69201042c/artiq/protocols/pipe_ipc.py#L127-L141

So, unless there is some other code path I'm not aware of (when I have time, I'll hack in a stack trace each time we close a handle to rule this out), it looks like create_subprocess is being called multiple times on the same AsyncioParentComm object or something like that. If that happened, I can see how a race could occur.

hartytp avatar May 14 '19 22:05 hartytp

The handle is also close()d by its __del__, but that shouldn't be invoked while pipe_ipc still has a reference…

dnadlinger avatar May 14 '19 22:05 dnadlinger

Looking at the trace, I think this race is probably in a single thread (which seems most likely given the original asyncio code). I'd bet a beer that it's a race in the applet code that's scheduling two autoclose events...

hartytp avatar May 14 '19 22:05 hartytp

But then (i.e. if it's a single thread) how would self._handle change between time of check and time of use?

dnadlinger avatar May 14 '19 22:05 dnadlinger

sigh...sorry, you're right. It's late.

Well, then I'm back to not understanding this at all. The next step I can think of would be to add a stack trace print to the PipeHandle.close to verify whether _auto_close is indeed being called twice, or if it's actually some other code path I haven't spotted yet. I'll do that when I have time.

hartytp avatar May 14 '19 23:05 hartytp

Improved the diagnostics to show which thread the calls are coming from:

close pipe handle 2872
close pipe handle 2872
closed handle 2872 in thread 6172
exception closing handle 2872 in thread 9636
[WinError 6] The handle is invalid
Exception ignored in: <bound method PipeHandle.__del__ of <PipeHandle closed>>
Traceback (most recent call last):
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 174, in __del__
close pipe handle 2792
closed handle 2792 in thread 6172
    self.close()
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 161, in close
    CloseHandle(self._handle)
OSError: [WinError 6] The handle is invalid
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd0-PMT ratio):print:close pipe handle 1132
INFO:applet(Monitor highna_apd0-PMT ratio):print:closed handle 1132 in thread 10300
close pipe handle 2584
closed handle 2584 in thread 6172
close pipe handle 2592
closed handle 2592 in thread 6172
close pipe handle 2848
closed handle 2848 in thread 6172
close pipe handle 2652
closed handle 2652 in thread 6172
close pipe handle 2580
closed handle 2580 in thread 6172
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd1):print:close pipe handle 1136
INFO:applet(Monitor highna_apd1):print:closed handle 1136 in thread 11440
close pipe handle 2620
closed handle 2620 in thread 6172
close pipe handle 2608
closed handle 2608 in thread 6172
close pipe handle 2800
closed handle 2800 in thread 6172
close pipe handle 2520
closed handle 2520 in thread 6172
close pipe handle 2588
closed handle 2588 in thread 6172
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd3-PMT ratio):print:close pipe handle 1196
INFO:applet(Monitor highna_apd3-PMT ratio):print:closed handle 1196 in thread 604
close pipe handle 2736
closed handle 2736 in thread 6172
close pipe handle 2808
closed handle 2808 in thread 6172
close pipe handle 2876
closed handle 2876 in thread 6172
close pipe handle 2896
closed handle 2896 in thread 6172
close pipe handle 2820
closed handle 2820 in thread 6172
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd2-PMT ratio):print:close pipe handle 1200
INFO:applet(Monitor highna_apd2-PMT ratio):print:closed handle 1200 in thread 2684
close pipe handle 2748
closed handle 2748 in thread 6172
close pipe handle 2756
closed handle 2756 in thread 6172
close pipe handle 2888
close pipe handle 2740
closed handle 2888 in thread 6172
close pipe handle 2740
closed handle 2740 in thread 9636
exception closing handle 2740 in thread 6172
[WinError 6] The handle is invalid
ERROR:dashboard:quamash._QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=OSError(9, 'The handle is invalid', None, 6, None)>
Traceback (most recent call last):
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\tasks.py", line 240, in _step
    result = coro.send(None)
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
    self.server[0].close()
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_events.py", line 284, in close
    pipe.close()
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 161, in close
    CloseHandle(self._handle)
OSError: [WinError 6] The handle is invalid

So, we definitely see the PipeHandles being closed from two threads...

class PipeHandle:
    """Wrapper for an overlapped pipe handle which is vaguely file-object like.

    The IOCP event loop can use these instead of socket objects.
    """
    def __init__(self, handle):
        print("new pipe handle: {}".format(handle))
        self._handle = handle

    def __repr__(self):
        if self._handle is not None:
            handle = 'handle=%r' % self._handle
        else:
            handle = 'closed'
        return '<%s %s>' % (self.__class__.__name__, handle)

    @property
    def handle(self):
        return self._handle

    def fileno(self):
        if self._handle is None:
            raise ValueError("I/O operatioon on closed pipe")
        return self._handle

    def close(self, *, CloseHandle=_winapi.CloseHandle):
        if self._handle is not None:
            print("close pipe handle {}".format(self._handle))
            try:
                CloseHandle(self._handle)
            except Exception as e:
                print("exception closing handle {} in thread {}"
                      .format(self._handle, threading.current_thread().ident))
                print(e)
                raise
            print("closed handle {} in thread {}, trace: {}"
                  .format(self._handle, threading.current_thread().ident,
                          traceback.format_stack()))
            self._handle = None

    def __del__(self):
        if self._handle is not None:
            warnings.warn("unclosed %r" % self, ResourceWarning)
            self.close()

    def __enter__(self):
        return self

    def __exit__(self, t, v, tb):
        self.close()

hartytp avatar May 15 '19 08:05 hartytp

With stack traces:

INFO:dashboard:root:ARTIQ dashboard 5.0.dev+1259.ge86e2bbe connected to Alice (10.255.6.191)
opened server pipe 2472
new pipe handle: 2472
new pipe handle: 2480
new pipe handle: 2484
opened server pipe 2520
new pipe handle: 2520
new pipe handle: 2516
new pipe handle: 2536
opened server pipe 2560
new pipe handle: 2560
opened server pipe 2556
new pipe handle: 2556
close pipe handle 2480
closed handle 2480 in thread 3132, trace: ['  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 193, in _run_module_as_main\n    "__main__", mod_spec)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 85, in _run_code\n    exec(code, run_globals)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 245, in <module>\n    main()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 242, in main\n    loop.run_until_complete(main_window.exit_request.wait())\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 281, in run_until_complete\n    self.run_forever()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 266, in run_forever\n    rslt = self.__app.exec_()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 207, in timerEvent\n    handle._run()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\events.py", line 127, in _run\n    self._callback(*self._args)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\proactor_events.py", line 136, in _call_connection_lost\n    self._sock.close()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n    traceback.format_stack()))\n']
close pipe handle 2560
close pipe handle 2472
closed handle 2560 in thread 3132, trace: ['  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 193, in _run_module_as_main\n    "__main__", mod_spec)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 85, in _run_code\n    exec(code, run_globals)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 245, in <module>\n    main()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 242, in main\n    loop.run_until_complete(main_window.exit_request.wait())\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 281, in run_until_complete\n    self.run_forever()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 266, in run_forever\n    rslt = self.__app.exec_()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 207, in timerEvent\n    handle._run()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\events.py", line 127, in _run\n    self._callback(*self._args)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\tasks.py", line 240, in _step\n    result = coro.send(None)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\protocols\\pipe_ipc.py", line 120, in _autoclose\n    self.server[0].close()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_events.py", line 284, in close\n    pipe.close()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n    traceback.format_stack()))\n']
closed handle 2472 in thread 10720, trace: ['  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 159, in run\n    events = self.__proactor.select(0.01)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 65, in select\n    self._poll(timeout)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 176, in __del__\n    self.close()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n    traceback.format_stack()))\n']
close pipe handle 2472
exception closing handle None in thread 3132
CloseHandle() argument must be int, not None
ERROR:dashboard:quamash._QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=TypeError('CloseHandle() argument must be int, not None',)>
Traceback (most recent call last):
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\tasks.py", line 240, in _step
    result = coro.send(None)
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
    self.server[0].close()
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_events.py", line 284, in close
    pipe.close()
  File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 162, in close
    CloseHandle(self._handle)
TypeError: CloseHandle() argument must be int, not None
INFO:applet(Camera):print:C:\Users\LabUser\Miniconda3\envs\artiq-env\python.exe: No module named oxart.applets.camera_viewer
close pipe handle 2484
closed handle 2484 in thread 3132, trace: ['  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 193, in _run_module_as_main\n    "__main__", mod_spec)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 85, in _run_code\n    exec(code, run_globals)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 245, in <module>\n    main()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 242, in main\n    loop.run_until_complete(main_window.exit_request.wait())\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 281, in run_until_complete\n    self.run_forever()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 266, in run_forever\n    rslt = self.__app.exec_()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 207, in timerEvent\n    handle._run()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\events.py", line 127, in _run\n    self._callback(*self._args)\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\proactor_events.py", line 136, in _call_connection_lost\n    self._sock.close()\n', '  File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n    traceback.format_stack()))\n']
opened server pipe 1880
new pipe handle: 1880
INFO:applet(Monitor highna_apd2):print:new pipe handle: 1120
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry:  640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
class PipeHandle:
    """Wrapper for an overlapped pipe handle which is vaguely file-object like.

    The IOCP event loop can use these instead of socket objects.
    """
    def __init__(self, handle):
        print("new pipe handle: {}".format(handle))
        self._handle = handle

    def __repr__(self):
        if self._handle is not None:
            handle = 'handle=%r' % self._handle
        else:
            handle = 'closed'
        return '<%s %s>' % (self.__class__.__name__, handle)

    @property
    def handle(self):
        return self._handle

    def fileno(self):
        if self._handle is None:
            raise ValueError("I/O operatioon on closed pipe")
        return self._handle

    def close(self, *, CloseHandle=_winapi.CloseHandle):
        if self._handle is not None:
            print("close pipe handle {}".format(self._handle))
            try:
                CloseHandle(self._handle)
            except Exception as e:
                print("exception closing handle {} in thread {}"
                      .format(self._handle, threading.current_thread().ident))
                print(e)
                raise
            print("closed handle {} in thread {}, trace: {}"
                  .format(self._handle, threading.current_thread().ident,
                          traceback.format_stack()))
            self._handle = None

    def __del__(self):
        if self._handle is not None:
            warnings.warn("unclosed %r" % self, ResourceWarning)
            self.close()

    def __enter__(self):
        return self

    def __exit__(self, t, v, tb):
        self.close()

So, the issue is that the pipe is being closed from proactor_events._call_connection_lost which is called in a different thread to the main thread of execution.

hartytp avatar May 15 '19 08:05 hartytp

I suspect the issue is that _force_close is being called, which schedules a close event in a separate thread.

https://github.com/python/asyncio/blob/a791d884b9110b59bada18c2dddd399e6adc40fd/asyncio/proactor_events.py#L111-L124

hartytp avatar May 15 '19 08:05 hartytp

__del__ being called outside of the main thread of execution:

closed handle 2928 in thread 2516, trace: 
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 159, in run\n    events = self.__proactor.select(0.01)\n
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 65, in select\n    self._poll(timeout)\n
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 176, in __del__\n    self.close()\n
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n    traceback.format_stack()))\n

This seems to be the most common race path.

hartytp avatar May 15 '19 10:05 hartytp

So, it looks like a race where __del__ is in the process of executing in a different thread (python gc?) while our _auto_close method runs. I don't fully understand python weak refs, but I would have expected the use of a weakref here to prevent this

    def close(self):
        if self._accept_pipe_future is not None:
            self._accept_pipe_future.cancel()
            self._accept_pipe_future = None
        # Close all instances which have not been connected to by a client.
        if self._address is not None:
            for pipe in self._free_instances:
                pipe.close()
            self._pipe = None
            self._address = None
            self._free_instances.clear()

hartytp avatar May 15 '19 10:05 hartytp

@dnadlinger I'm out of time to look into this. I can reproduce it pretty reliably if I restart the dashboard with some applets already open from the last time I ran it, then close the applets promptly after startup.

I still don't understand exactly where __del__ is being called, and why that doesn't invalidate the weak ref. I suspect it's probably something to do with quamash, and maybe this code isn't as threadsafe as it claims...

https://github.com/OxfordIonTrapGroup/quamash/blob/b006c9a163f55aba044a9ad8532c65c13f35121f/quamash/_windows.py#L84

hartytp avatar May 15 '19 11:05 hartytp

@sbourdeauducq: Any ideas/memories from when you wrote the ARTIQ side of things originally?

dnadlinger avatar May 15 '19 11:05 dnadlinger

I guess a reference to the pipe is being held in some other thread, and released at some point, triggering the garbage collection race. But, I don't know enough about python gc to debug that.

hartytp avatar May 15 '19 11:05 hartytp

Does anyone actually understand python garbage collection and weak references? When the last (strong) reference to an object goes out of scope, when are the weak references to it invalidated? When the gc calls __del__? If so, which thread does that occur in? and, how is one supposed to avoid races?

hartytp avatar May 15 '19 11:05 hartytp

The handle is also close()d by its del, but that shouldn't be invoked while pipe_ipc still has a reference

@dnadlinger FWIW, __del__ is invoced regularly when self._handle is not None. Asyncio has some warning code there, but it doesn't make it to the console. If you add a one-line print you'll see lots of cases of that, so no surprise we see occasional exceptions due to a race.

The traceback from __del__ always seems to look like https://github.com/m-labs/artiq/issues/1216#issuecomment-492602261 however, I don't understand that execution flow. Is calling self__.poll triggering the python gc? If so, how does one go about debugging this kind of issue? I can't see any decent way of tracking the references to the pipe and figuring out what the actual problem is.

hartytp avatar May 15 '19 16:05 hartytp

Regarding the warning not being shown, ResourceWarning are filtered out by default (https://docs.python.org/3/library/warnings.html#default-warning-filter).

dnadlinger avatar May 15 '19 20:05 dnadlinger

@dnadlinger yes.

FWIW, if this is a gc issue then it would have to be a cyclic reference since it's not being cleaned promptly. So, we might be able to find it using something like: http://code.activestate.com/recipes/523004-find-cyclical-references/ If it's not a cyclic reference thing then I'm at a loss to explain the stack trace I posted where we jump from the poll to the PipeHandle destructor.

Edit: or simpler, turn garbage collection off (gc.set_debug(gc.DEBUG_SAVEALL))and each time we close pipe handles check whether the pipe is in the garbage pile

gc.collect()
for item in gc.garbage:
    if id(item) == id(self):
        raise Exception("Circular reference to PipeHandle")

hartytp avatar May 15 '19 21:05 hartytp

Nope, not garbage collection at all. What's happening is this: quamash processes events in a separate thread. Those events can store a reference to the handle. When they are released, for example here https://github.com/OxfordIonTrapGroup/quamash/blob/b006c9a163f55aba044a9ad8532c65c13f35121f/quamash/init.py#L210 __del__ gets called from that thread, leading to a possible race with the main program thread.

hartytp avatar May 16 '19 11:05 hartytp