artiq
artiq copied to clipboard
Dashboard occasionally throws "Task exception was never retrieved" errors on Windows
Bug Report
One-Line Summary
The dashboard occasionally throws "Task exception was never retrieved" errors on Windows, even with no user activity.
Issue Details
These errors seem to occur occasionally (~10 per day). They occur even without any user activity. For example, on a dashboard which has some applets open plotting actively changing datasets, but without any user button presses, or any applets being opened or closed.
Steps to Reproduce
No known reproduction
Actual (undesired) Behavior
The traceback is always like this:
quamash.QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=OSError(9, 'The handle is invalid', None, 6, None)>
Traceback (most recent call last):
File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\asyncio\tasks.py", line 240, in _step
result = coro.send(None)
File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
self.server[0].close()
File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\asyncio\windows_events.py", line 283, in close
pipe.close()
File "C:\Users\ballance\AppData\Local\Continuum\anaconda3\envs\artiq\lib\asyncio\windows_utils.py", line 157, in close
CloseHandle(self._handle)
OSError: [WinError 6] The handle is invalid
Your System
Using Artiq from master
I haven't seen this. Also on windows.
Sometimes after an experiment terminates with an exception I see
asyncio:Task exception was never retrieved
future: <Task finished coro=<DatasetNamespaces.finish_rid.<locals>.gc() done, defined at /home/ion/artiq/artiq/artiq/master/databases.py:125> exception=KeyError('datasets_rid_11065',)>
Traceback (most recent call last):
File "/home/ion/anaconda3/envs/artiq-main/lib/python3.5/asyncio/tasks.py", line 240, in _step
result = coro.send(None)
File "/home/ion/artiq/artiq/artiq/master/databases.py", line 127, in gc
self._publisher.remove_notifier(_rid_namespace_name(rid))
File "/home/ion/artiq/artiq/artiq/protocols/sync_struct.py", line 306, in remove_notifier
notifier = self.notifiers[name]
KeyError: 'datasets_rid_11065
Different issue? Windows vs Linux, pipe_rpc vs sync_struct, different errors, different code area, different tools and components.
Yes, it is (and I'll break it out in a second). Just posting here as a reminder that the frequency of the occurance of these events is due to the fact that it has more than one cause...
The latter is due to a feature not merged upstream (yet), although I thought I had fixed that particular issue a while back.
I think this is a race condition inside asyncio....
I hacked my local copy of asyncio as follows
class PipeHandle:
"""Wrapper for an overlapped pipe handle which is vaguely file-object like.
The IOCP event loop can use these instead of socket objects.
"""
def __init__(self, handle):
print("new pipe handle: {}".format(handle))
self._handle = handle
self._closed = False
def __repr__(self):
if self._handle is not None:
handle = 'handle=%r' % self._handle
else:
handle = 'closed'
return '<%s %s>' % (self.__class__.__name__, handle)
@property
def handle(self):
return self._handle
def fileno(self):
if self._handle is None:
raise ValueError("I/O operatioon on closed pipe")
return self._handle
def close(self, *, CloseHandle=_winapi.CloseHandle):
if self._handle is not None:
print("close pipe handle: {} {}".format(self._handle, self._closed))
try:
print(CloseHandle)
CloseHandle(self._handle)
except Exception as e:
print("wtf {} {} {}".format(e, self._handle, self._closed))
raise
print("closed")
self._closed = True
self._handle = None
def __del__(self):
if self._handle is not None:
warnings.warn("unclosed %r" % self, ResourceWarning)
self.close()
def __enter__(self):
return self
def __exit__(self, t, v, tb):
self.close()
and I see...
opened server pipe 2476
new pipe handle: 2476
new pipe handle: 2484
new pipe handle: 2488
opened server pipe 2536
new pipe handle: 2536
opened server pipe 2544
new pipe handle: 2544
INFO:applet(422 frequency):print:new pipe handle: 1176
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 200x130+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(422 frequency):print:close pipe handle: 1176 False
INFO:applet(422 frequency):print:<built-in function CloseHandle>
INFO:applet(422 frequency):print:closed
close pipe handle: 2484 False
<built-in function CloseHandle>
closed
close pipe handle: 2488 False
<built-in function CloseHandle>
closed
close pipe handle: 2544 False
<built-in function CloseHandle>
close pipe handle: 2536 False
<built-in function CloseHandle>
closed
closed
close pipe handle: 2536 False
<built-in function CloseHandle>
wtf CloseHandle() argument must be int, not None None True
ERROR:dashboard:quamash._QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=TypeError('CloseHandle() argument must be int, not None',)>
Traceback (most recent call last):
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\tasks.py", line 240, in _step
result = coro.send(None)
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
self.server[0].close()
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_events.py", line 284, in close
pipe.close()
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 162, in close
CloseHandle(self._handle)
T
I guess this is a bug to do with python's lazy evaluation or something like that? It looks like by the time self._handle
is captured for the call to CloseHandle
it's already been set to None
.
@cjbe can you try patching your local asyncio here https://github.com/python/cpython/blob/a2fedd8c910cb5f5b9bd568d6fd44d63f8f5cfa5/Lib/asyncio/windows_utils.py#L105-L108
to be something like:
def close(self, *, CloseHandle=_winapi.CloseHandle):
if self._handle is not None:
handle = self._handle
self._handle = None
CloseHandle(handle)
@cjbe @dnadlinger can you apply that patch locally and confirm that it fixes the error? If so, I'll close this issue and move the discussion to asyncio.
@hartytp: Fixed the RID namespacing-related issue in our local fork.
As for the handle closing issue, doesn't this look more like a race condition between (OS) threads? If that is the case, your patch would only paper over the actual problem. (There shouldn't be any lazy evaluation of that kind in Python.)
Did you stumble over a good way of reproducing this? Repeatedly opening/closing a bunch of applets maybe?
(Maybe it's actually our, i.e. dashboard, code calling close()
from different threads after all – an awkwardly straightforward way of debugging this would be to, in close()
, save backtraces into a thread-safe global buffer, and fetch/print all of them for a given when the above issue occurs.)
Did you stumble over a good way of reproducing this? Repeatedly opening/closing a bunch of applets maybe?
Yes, that basically does it.
As for the handle closing issue, doesn't this look more like a race condition between (OS) threads? If that is the case, your patch would only paper over the actual problem. (There shouldn't be any lazy evaluation of that kind in Python.)
hmmm...I must have misread the log I posted. The original thing I was looking for was a race where a PipeHandle is closed twice, which is what the log clearly shows is occurring. Somehow I convinced myself that wasn't the case.
So it is a race between threads.
Which two threads would that be, though? Or does CloseHandle
itself cause some callbacks to be invoked? (Or e.g. the object to be deleted indirectly?) It's been too long since I've had to deal with IOCP…
That's a good question. I haven't figured it out.
AFAICT the handle is only closed by the AsyncioParentComm._auto_close
task. https://github.com/m-labs/artiq/blob/84b91ee8bd416bd894497611e4a371f69201042c/artiq/protocols/pipe_ipc.py#L118-L125 That task is created in AsyncioParentComm._auto_close
here https://github.com/m-labs/artiq/blob/84b91ee8bd416bd894497611e4a371f69201042c/artiq/protocols/pipe_ipc.py#L127-L141
So, unless there is some other code path I'm not aware of (when I have time, I'll hack in a stack trace each time we close a handle to rule this out), it looks like create_subprocess
is being called multiple times on the same AsyncioParentComm
object or something like that. If that happened, I can see how a race could occur.
The handle is also close()
d by its __del__
, but that shouldn't be invoked while pipe_ipc still has a reference…
Looking at the trace, I think this race is probably in a single thread (which seems most likely given the original asyncio
code). I'd bet a beer that it's a race in the applet code that's scheduling two autoclose events...
But then (i.e. if it's a single thread) how would self._handle change between time of check and time of use?
sigh...sorry, you're right. It's late.
Well, then I'm back to not understanding this at all. The next step I can think of would be to add a stack trace print to the PipeHandle.close
to verify whether _auto_close
is indeed being called twice, or if it's actually some other code path I haven't spotted yet. I'll do that when I have time.
Improved the diagnostics to show which thread the calls are coming from:
close pipe handle 2872
close pipe handle 2872
closed handle 2872 in thread 6172
exception closing handle 2872 in thread 9636
[WinError 6] The handle is invalid
Exception ignored in: <bound method PipeHandle.__del__ of <PipeHandle closed>>
Traceback (most recent call last):
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 174, in __del__
close pipe handle 2792
closed handle 2792 in thread 6172
self.close()
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 161, in close
CloseHandle(self._handle)
OSError: [WinError 6] The handle is invalid
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd0-PMT ratio):print:close pipe handle 1132
INFO:applet(Monitor highna_apd0-PMT ratio):print:closed handle 1132 in thread 10300
close pipe handle 2584
closed handle 2584 in thread 6172
close pipe handle 2592
closed handle 2592 in thread 6172
close pipe handle 2848
closed handle 2848 in thread 6172
close pipe handle 2652
closed handle 2652 in thread 6172
close pipe handle 2580
closed handle 2580 in thread 6172
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd1):print:close pipe handle 1136
INFO:applet(Monitor highna_apd1):print:closed handle 1136 in thread 11440
close pipe handle 2620
closed handle 2620 in thread 6172
close pipe handle 2608
closed handle 2608 in thread 6172
close pipe handle 2800
closed handle 2800 in thread 6172
close pipe handle 2520
closed handle 2520 in thread 6172
close pipe handle 2588
closed handle 2588 in thread 6172
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd3-PMT ratio):print:close pipe handle 1196
INFO:applet(Monitor highna_apd3-PMT ratio):print:closed handle 1196 in thread 604
close pipe handle 2736
closed handle 2736 in thread 6172
close pipe handle 2808
closed handle 2808 in thread 6172
close pipe handle 2876
closed handle 2876 in thread 6172
close pipe handle 2896
closed handle 2896 in thread 6172
close pipe handle 2820
closed handle 2820 in thread 6172
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x175+2169+1081 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
INFO:applet(Monitor highna_apd2-PMT ratio):print:close pipe handle 1200
INFO:applet(Monitor highna_apd2-PMT ratio):print:closed handle 1200 in thread 2684
close pipe handle 2748
closed handle 2748 in thread 6172
close pipe handle 2756
closed handle 2756 in thread 6172
close pipe handle 2888
close pipe handle 2740
closed handle 2888 in thread 6172
close pipe handle 2740
closed handle 2740 in thread 9636
exception closing handle 2740 in thread 6172
[WinError 6] The handle is invalid
ERROR:dashboard:quamash._QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=OSError(9, 'The handle is invalid', None, 6, None)>
Traceback (most recent call last):
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\tasks.py", line 240, in _step
result = coro.send(None)
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
self.server[0].close()
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_events.py", line 284, in close
pipe.close()
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 161, in close
CloseHandle(self._handle)
OSError: [WinError 6] The handle is invalid
So, we definitely see the PipeHandles being closed from two threads...
class PipeHandle:
"""Wrapper for an overlapped pipe handle which is vaguely file-object like.
The IOCP event loop can use these instead of socket objects.
"""
def __init__(self, handle):
print("new pipe handle: {}".format(handle))
self._handle = handle
def __repr__(self):
if self._handle is not None:
handle = 'handle=%r' % self._handle
else:
handle = 'closed'
return '<%s %s>' % (self.__class__.__name__, handle)
@property
def handle(self):
return self._handle
def fileno(self):
if self._handle is None:
raise ValueError("I/O operatioon on closed pipe")
return self._handle
def close(self, *, CloseHandle=_winapi.CloseHandle):
if self._handle is not None:
print("close pipe handle {}".format(self._handle))
try:
CloseHandle(self._handle)
except Exception as e:
print("exception closing handle {} in thread {}"
.format(self._handle, threading.current_thread().ident))
print(e)
raise
print("closed handle {} in thread {}, trace: {}"
.format(self._handle, threading.current_thread().ident,
traceback.format_stack()))
self._handle = None
def __del__(self):
if self._handle is not None:
warnings.warn("unclosed %r" % self, ResourceWarning)
self.close()
def __enter__(self):
return self
def __exit__(self, t, v, tb):
self.close()
With stack traces:
INFO:dashboard:root:ARTIQ dashboard 5.0.dev+1259.ge86e2bbe connected to Alice (10.255.6.191)
opened server pipe 2472
new pipe handle: 2472
new pipe handle: 2480
new pipe handle: 2484
opened server pipe 2520
new pipe handle: 2520
new pipe handle: 2516
new pipe handle: 2536
opened server pipe 2560
new pipe handle: 2560
opened server pipe 2556
new pipe handle: 2556
close pipe handle 2480
closed handle 2480 in thread 3132, trace: [' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 193, in _run_module_as_main\n "__main__", mod_spec)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 85, in _run_code\n exec(code, run_globals)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 245, in <module>\n main()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 242, in main\n loop.run_until_complete(main_window.exit_request.wait())\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 281, in run_until_complete\n self.run_forever()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 266, in run_forever\n rslt = self.__app.exec_()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 207, in timerEvent\n handle._run()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\events.py", line 127, in _run\n self._callback(*self._args)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\proactor_events.py", line 136, in _call_connection_lost\n self._sock.close()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n traceback.format_stack()))\n']
close pipe handle 2560
close pipe handle 2472
closed handle 2560 in thread 3132, trace: [' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 193, in _run_module_as_main\n "__main__", mod_spec)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 85, in _run_code\n exec(code, run_globals)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 245, in <module>\n main()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 242, in main\n loop.run_until_complete(main_window.exit_request.wait())\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 281, in run_until_complete\n self.run_forever()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 266, in run_forever\n rslt = self.__app.exec_()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 207, in timerEvent\n handle._run()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\events.py", line 127, in _run\n self._callback(*self._args)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\tasks.py", line 240, in _step\n result = coro.send(None)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\protocols\\pipe_ipc.py", line 120, in _autoclose\n self.server[0].close()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_events.py", line 284, in close\n pipe.close()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n traceback.format_stack()))\n']
closed handle 2472 in thread 10720, trace: [' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 159, in run\n events = self.__proactor.select(0.01)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 65, in select\n self._poll(timeout)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 176, in __del__\n self.close()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n traceback.format_stack()))\n']
close pipe handle 2472
exception closing handle None in thread 3132
CloseHandle() argument must be int, not None
ERROR:dashboard:quamash._QEventLoop:Task exception was never retrieved
future: <Task finished coro=<AsyncioParentComm._autoclose() done, defined at C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py:118> exception=TypeError('CloseHandle() argument must be int, not None',)>
Traceback (most recent call last):
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\tasks.py", line 240, in _step
result = coro.send(None)
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\site-packages\artiq\protocols\pipe_ipc.py", line 120, in _autoclose
self.server[0].close()
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_events.py", line 284, in close
pipe.close()
File "C:\Users\LabUser\Miniconda3\envs\artiq-env\lib\asyncio\windows_utils.py", line 162, in close
CloseHandle(self._handle)
TypeError: CloseHandle() argument must be int, not None
INFO:applet(Camera):print:C:\Users\LabUser\Miniconda3\envs\artiq-env\python.exe: No module named oxart.applets.camera_viewer
close pipe handle 2484
closed handle 2484 in thread 3132, trace: [' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 193, in _run_module_as_main\n "__main__", mod_spec)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\runpy.py", line 85, in _run_code\n exec(code, run_globals)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 245, in <module>\n main()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\artiq\\frontend\\artiq_dashboard.py", line 242, in main\n loop.run_until_complete(main_window.exit_request.wait())\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 281, in run_until_complete\n self.run_forever()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 266, in run_forever\n rslt = self.__app.exec_()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\__init__.py", line 207, in timerEvent\n handle._run()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\events.py", line 127, in _run\n self._callback(*self._args)\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\proactor_events.py", line 136, in _call_connection_lost\n self._sock.close()\n', ' File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n traceback.format_stack()))\n']
opened server pipe 1880
new pipe handle: 1880
INFO:applet(Monitor highna_apd2):print:new pipe handle: 1120
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
QWindowsWindow::setGeometry: Unable to set geometry 391x70+0+0 on QWindow/''. Resulting geometry: 640x480+960+465 (frame: 0, 0, 0, 0, custom margin: 0, 0, 0, 0, minimum size: 0x0, maximum size: 16777215x16777215).
class PipeHandle:
"""Wrapper for an overlapped pipe handle which is vaguely file-object like.
The IOCP event loop can use these instead of socket objects.
"""
def __init__(self, handle):
print("new pipe handle: {}".format(handle))
self._handle = handle
def __repr__(self):
if self._handle is not None:
handle = 'handle=%r' % self._handle
else:
handle = 'closed'
return '<%s %s>' % (self.__class__.__name__, handle)
@property
def handle(self):
return self._handle
def fileno(self):
if self._handle is None:
raise ValueError("I/O operatioon on closed pipe")
return self._handle
def close(self, *, CloseHandle=_winapi.CloseHandle):
if self._handle is not None:
print("close pipe handle {}".format(self._handle))
try:
CloseHandle(self._handle)
except Exception as e:
print("exception closing handle {} in thread {}"
.format(self._handle, threading.current_thread().ident))
print(e)
raise
print("closed handle {} in thread {}, trace: {}"
.format(self._handle, threading.current_thread().ident,
traceback.format_stack()))
self._handle = None
def __del__(self):
if self._handle is not None:
warnings.warn("unclosed %r" % self, ResourceWarning)
self.close()
def __enter__(self):
return self
def __exit__(self, t, v, tb):
self.close()
So, the issue is that the pipe is being closed from proactor_events._call_connection_lost
which is called in a different thread to the main thread of execution.
I suspect the issue is that _force_close
is being called, which schedules a close event in a separate thread.
https://github.com/python/asyncio/blob/a791d884b9110b59bada18c2dddd399e6adc40fd/asyncio/proactor_events.py#L111-L124
__del__
being called outside of the main thread of execution:
closed handle 2928 in thread 2516, trace:
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 159, in run\n events = self.__proactor.select(0.01)\n
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\site-packages\\Quamash-0.6.1-py3.5.egg\\quamash\\_windows.py", line 65, in select\n self._poll(timeout)\n
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 176, in __del__\n self.close()\n
- File "C:\\Users\\LabUser\\Miniconda3\\envs\\artiq-env\\lib\\asyncio\\windows_utils.py", line 170, in close\n traceback.format_stack()))\n
This seems to be the most common race path.
So, it looks like a race where __del__
is in the process of executing in a different thread (python gc?) while our _auto_close
method runs. I don't fully understand python weak refs, but I would have expected the use of a weakref here to prevent this
def close(self):
if self._accept_pipe_future is not None:
self._accept_pipe_future.cancel()
self._accept_pipe_future = None
# Close all instances which have not been connected to by a client.
if self._address is not None:
for pipe in self._free_instances:
pipe.close()
self._pipe = None
self._address = None
self._free_instances.clear()
@dnadlinger I'm out of time to look into this. I can reproduce it pretty reliably if I restart the dashboard with some applets already open from the last time I ran it, then close the applets promptly after startup.
I still don't understand exactly where __del__
is being called, and why that doesn't invalidate the weak ref. I suspect it's probably something to do with quamash, and maybe this code isn't as threadsafe as it claims...
https://github.com/OxfordIonTrapGroup/quamash/blob/b006c9a163f55aba044a9ad8532c65c13f35121f/quamash/_windows.py#L84
@sbourdeauducq: Any ideas/memories from when you wrote the ARTIQ side of things originally?
I guess a reference to the pipe is being held in some other thread, and released at some point, triggering the garbage collection race. But, I don't know enough about python gc to debug that.
Does anyone actually understand python garbage collection and weak references? When the last (strong) reference to an object goes out of scope, when are the weak references to it invalidated? When the gc calls __del__
? If so, which thread does that occur in? and, how is one supposed to avoid races?
The handle is also close()d by its del, but that shouldn't be invoked while pipe_ipc still has a reference
@dnadlinger FWIW, __del__
is invoced regularly when self._handle is not None
. Asyncio has some warning code there, but it doesn't make it to the console. If you add a one-line print you'll see lots of cases of that, so no surprise we see occasional exceptions due to a race.
The traceback from __del__
always seems to look like https://github.com/m-labs/artiq/issues/1216#issuecomment-492602261 however, I don't understand that execution flow. Is calling self__.poll
triggering the python gc? If so, how does one go about debugging this kind of issue? I can't see any decent way of tracking the references to the pipe and figuring out what the actual problem is.
Regarding the warning not being shown, ResourceWarning
are filtered out by default (https://docs.python.org/3/library/warnings.html#default-warning-filter).
@dnadlinger yes.
FWIW, if this is a gc issue then it would have to be a cyclic reference since it's not being cleaned promptly. So, we might be able to find it using something like: http://code.activestate.com/recipes/523004-find-cyclical-references/ If it's not a cyclic reference thing then I'm at a loss to explain the stack trace I posted where we jump from the poll to the PipeHandle destructor.
Edit: or simpler, turn garbage collection off (gc.set_debug(gc.DEBUG_SAVEALL)
)and each time we close pipe handles check whether the pipe is in the garbage pile
gc.collect()
for item in gc.garbage:
if id(item) == id(self):
raise Exception("Circular reference to PipeHandle")
Nope, not garbage collection at all. What's happening is this: quamash processes events in a separate thread. Those events can store a reference to the handle. When they are released, for example here https://github.com/OxfordIonTrapGroup/quamash/blob/b006c9a163f55aba044a9ad8532c65c13f35121f/quamash/init.py#L210 __del__
gets called from that thread, leading to a possible race with the main program thread.