cpython
cpython copied to clipboard
`IsolatedAsyncioTestCase` does not call `asyncio.set_event_loop` before `setUp` anymore, `asyncio.Runner`+`PidFdChildWatcher` leaves zombie processes
demo reproducer:
import unittest
import asyncio
class DemoTestCase1(unittest.IsolatedAsyncioTestCase):
def setUp(self):
self.loop = asyncio.get_event_loop_policy().get_event_loop()
async def test_demo(self):
print("hello!")
class DemoTestCase2(unittest.IsolatedAsyncioTestCase):
def setUp(self):
self.loop = asyncio.get_event_loop_policy().get_event_loop()
async def test_demo(self):
print("hello!")
if __name__ == "__main__":
unittest.main()
see also https://github.com/python/cpython/issues/93896
graingert@conscientious testing310 ~/projects python3.10 foo.py
hello!
.hello!
.
----------------------------------------------------------------------
Ran 2 tests in 0.002s
OK
graingert@conscientious testing310 ~/projects python3.11 foo.py
hello!
.E
======================================================================
ERROR: test_demo (__main__.DemoTestCase2.test_demo)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3.11/unittest/async_case.py", line 82, in _callSetUp
self._asyncioTestContext.run(self.setUp)
File "/home/graingert/projects/foo.py", line 14, in setUp
self.loop = asyncio.get_event_loop_policy().get_event_loop()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/events.py", line 677, in get_event_loop
raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'MainThread'.
----------------------------------------------------------------------
Ran 2 tests in 0.002s
FAILED (errors=1)
it looks like this is because asyncio.Runner calls asyncio.set_event_loop( in Runner.run
it looks like asyncio.set_event_loop is also being called an additional time per call to asyncio.Runner.run which will call get_child_watcher().attach_loop(loop) every call which causes the PidfdChildWatcher to blow away its callbacks
yep that's exactly what's happening here:
import asyncio
import sys
asyncio.set_child_watcher(asyncio.PidfdChildWatcher())
tasks = set()
pids = set()
async def subprocess():
proc = await asyncio.create_subprocess_exec(
sys.executable,
"-c",
"import time; print('hello!'); import time; time.sleep(2); print('goodbye')",
)
pids.add(proc.pid)
print(await proc.communicate())
async def main():
task = asyncio.create_task(subprocess())
tasks.add(task)
await asyncio.sleep(1)
async def main2():
try:
async with asyncio.timeout(3):
await asyncio.gather(*tasks)
except TimeoutError:
import psutil
for pid in pids:
print(psutil.Process(pid))
else:
print("success!")
if __name__ == "__main__":
with asyncio.Runner() as runner:
runner.run(main())
runner.run(main2())
output
hello!
goodbye
psutil.Process(pid=406749, name='python', status='zombie', started='10:13:40')
https://github.com/python/cpython/issues/95584 is vaguely related
The issue here is that asyncio.Runner calls set_event_loop too late and too often
In addition child watchers that use attach_loop are prone to unusual behaviour when the main thread switches loop
It looks like IsolatedAsyncioTestCase when the Pidfdchildwatcher is used will break a subprocess started in asyncSetUp
import asyncio
import sys
import unittest
import asyncio
asyncio.set_child_watcher(asyncio.PidfdChildWatcher())
def create_free_port():
return 4 # chosen by a fair dice roll
class TestProc(unittest.IsolatedAsyncioTestCase):
async def asyncSetUp(self):
self.port = create_free_port()
self.proc = await asyncio.create_subprocess_exec(
sys.executable,
"-c", # more realistically this might be an http server or database
f"import time; print('listening on {self.port}'); import time; time.sleep(2); print('goodbye')",
)
async def testProc(self):
print(f"connecting to {self.port}")
async def asyncTearDown(self):
await self.proc.communicate() # hangs forever on 3.11
if __name__ == "__main__":
unittest.main()
on python3.10 this produces:
connecting to 4
listening on 4
goodbye
.
----------------------------------------------------------------------
Ran 1 test in 2.022s
OK
on python3.11 it hangs forever in await self.proc.communicate()
I think IsolatedAsyncioTestCase can be fixed by backing out the changes to use asyncio.Runner for now in 3.11 and then in 3.12, copy out all of unittest's TestCase verbatim, upgrade all the methods to async def, then use asyncio.run(self.run()) rather than trying to go back in and out of the the asyncio eventloop.
this would also allow people to use yield-safe dependent contextmanagers like asyncio.timeout and asyncio.TaskGroup in the enterAsyncContext() method
Please create a separate issue for the PidfdChildWatcher issue to avoid confusion (done https://github.com/python/cpython/issues/95899) Will take a look on this now.
I think IsolatedAsyncioTestCase can be fixed by backing out the changes to use
asyncio.Runnerfor now in 3.11 and then in 3.12, copy out all of unittest's TestCase verbatim, upgrade all the methods to async def, then useasyncio.run(self.run())rather than trying to go back in and out of the the asyncio eventloop.
I'm not sure I follow. Are you saying that IsolatedAsyncioTestCase should just be a textual copy of TestCase with some small changes (def -> async def)? That seems brittle -- if anything gets added to TestCase we'd have to remember to repeat the change in IsolatedAsyncioTestCase.
this would also allow people to use
yield-safedependent contextmanagers likeasyncio.timeoutandasyncio.TaskGroupin theenterAsyncContext()method
What does "yield-safe dependent" mean? I'm guessing that ATM self.enterAsyncContext(timeout(2)) doesn't work? Why not? What goes wrong? (Asking you before I go on a wild goose chase trying to construct a test case.)
self.enterAsyncContext(timeout(2)) enter and exit gets run in two different tasks, and it always suppresses the exception thrown into it.
There's more to yield safety than just being in the same task and propagating exceptions, @njsmith has a draft PEP
That seems brittle -- if anything gets added to TestCase we'd have to remember to repeat the change in IsolatedAsyncioTestCase.
If this was something you wanted me to work on, I'd implement the same testing strategy as AsyncExitStack where a SyncExitStack is built out of AsyncExitStack and run through the same test suite as the real ExitStack see https://github.com/python/cpython/issues/95902 for a plan on how to fix that same issue with asynccontextmanager and contextmanager
Okay let's back out of the speculative projects and focus on this PR. ATM I have no idea how the one line Kumar added fixes the test, and it feels too much like "programming by random modification" to me to just approve because it makes the test pass. :-) I will get back to this.
Speculative projects moved here
https://discuss.python.org/t/an-async-testcase-with-support-for-structured-concurrency-context-managers/18262
The changeset breaks wasm32-emscripten tests. The issue is TestSendfile in test_os. The entire test class is skipped because wasm32-emscripten does not have os.sendfile.
test_flags (test.test_os.TestSendfile.test_flags) ... skipped 'test needs os.sendfile()'
test_headers (test.test_os.TestSendfile.test_headers) ... skipped 'test needs os.sendfile()'
test_headers_overflow_32bits (test.test_os.TestSendfile.test_headers_overflow_32bits) ... skipped 'test needs os.sendfile()'
test_invalid_offset (test.test_os.TestSendfile.test_invalid_offset) ... skipped 'test needs os.sendfile()'
test_keywords (test.test_os.TestSendfile.test_keywords) ... skipped 'test needs os.sendfile()'
test_offset_overflow (test.test_os.TestSendfile.test_offset_overflow) ... skipped 'test needs os.sendfile()'
test_send_at_certain_offset (test.test_os.TestSendfile.test_send_at_certain_offset) ... skipped 'test needs os.sendfile()'
test_send_whole_file (test.test_os.TestSendfile.test_send_whole_file) ... skipped 'test needs os.sendfile()'
test_trailers (test.test_os.TestSendfile.test_trailers) ... skipped 'test needs os.sendfile()'
test_trailers_overflow_32bits (test.test_os.TestSendfile.test_trailers_overflow_32bits) ... skipped 'test needs os.sendfile()'
Despite the class level skip, the test runner runs some setup routine. The recent change caused the async test runner to set up an event loop. This does not work on wasm32-emscripten. There is no socketpair() implementation and you cannot create a listening socket.
Here is a traceback of a wasm32-emscripten run with an extra ``return NULL;insocket.listen()`. As you can see it is trying to set up the event loop although the entire test class is skipped.
test test_os crashed -- Traceback (most recent call last):
File "/home/heimes/dev/python/cpython/Lib/test/libregrtest/runtest.py", line 360, in _runtest_inner
refleak = _runtest_inner2(ns, test_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/test/libregrtest/runtest.py", line 317, in _runtest_inner2
test_runner()
File "/home/heimes/dev/python/cpython/Lib/test/libregrtest/runtest.py", line 281, in _test_module
support.run_unittest(tests)
File "/home/heimes/dev/python/cpython/Lib/test/support/__init__.py", line 1216, in run_unittest
_run_suite(suite)
File "/home/heimes/dev/python/cpython/Lib/test/support/__init__.py", line 1090, in _run_suite
result = runner.run(suite)
^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/unittest/runner.py", line 208, in run
test(result)
File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 122, in run
test(result)
File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 122, in run
test(result)
File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 122, in run
test(result)
File "/home/heimes/dev/python/cpython/Lib/unittest/case.py", line 678, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/unittest/async_case.py", line 129, in run
self._setupAsyncioRunner()
File "/home/heimes/dev/python/cpython/Lib/unittest/async_case.py", line 122, in _setupAsyncioRunner
runner.get_loop()
File "/home/heimes/dev/python/cpython/Lib/asyncio/runners.py", line 82, in get_loop
self._lazy_init()
File "/home/heimes/dev/python/cpython/Lib/asyncio/runners.py", line 136, in _lazy_init
self._loop = events.new_event_loop()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/asyncio/events.py", line 805, in new_event_loop
return get_event_loop_policy().new_event_loop()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/asyncio/events.py", line 695, in new_event_loop
return self._loop_factory()
^^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/asyncio/unix_events.py", line 64, in __init__
super().__init__(selector)
File "/home/heimes/dev/python/cpython/Lib/asyncio/selector_events.py", line 56, in __init__
self._make_self_pipe()
File "/home/heimes/dev/python/cpython/Lib/asyncio/selector_events.py", line 107, in _make_self_pipe
self._ssock, self._csock = socket.socketpair()
^^^^^^^^^^^^^^^^^^^
File "/home/heimes/dev/python/cpython/Lib/socket.py", line 634, in socketpair
lsock.listen()
SystemError: <method 'listen' of '_socket.socket' objects> returned NULL without setting an exception
I propose to roll back the PR (https://github.com/python/cpython/pull/95898) since I have no idea how to deal with this.
FWIW the problem is that _setupAsyncioRunner() is called even when the test is supposed to be skipped.
I suppose a fix would be to test for the skip flags (see TestCase.run()) before calling it.
While the demo no longer fails, we need to keep this open until #96033 lands. Given all the complexity this issue has already experienced I don't think we can expect to backport this to 3.11. I guess we'll have to recommend that affected user tests in 3.11 can call self._asyncioRunner.get_loop() in setUp() -- not ideal because of the protected variable but okay if combined with a check for 3.11. We might backport to 3.11.1, once 3.11.0 has been released.
I guess we'll have to recommend that affected user tests in 3.11 can call self._asyncioRunner.get_loop() in setUp()
The fix I'm thinking of for aiohttp is to move to asyncio.get_running_loop() in the asyncSetUp
https://github.com/aio-libs/aiohttp/pull/6877/files#diff-70599d14cae2351e35e46867bce26e325e84f3b84ce218718239c4bfeac4dcf5R437
I guess we'll have to recommend that affected user tests in 3.11 can call self._asyncioRunner.get_loop() in setUp()
The fix I'm thinking of for aiohttp is to move to asyncio.get_running_loop() in the asyncSetUp
https://github.com/aio-libs/aiohttp/pull/6877/files#diff-70599d14cae2351e35e46867bce26e325e84f3b84ce218718239c4bfeac4dcf5R437
It looks like you're also moving from setUp to asyncSetUp -- doesn't the latter always run after the runner has been initialized?
Yes, I chose to do this because it's in an asyncio async function so it's guaranteed the loop is available. Here aiohttp is effectively passing on the bug to it's downstream users.
Therefore maybe what aiohttp should do is:
- not land this workaround
- wait for the patch in CPython
- deprecate this TestCase.loop property and tell people to always use get_running_loop themselves
I think it was also discovered that this whole aiohttp TestCase class was possibly supposed to have been deprecated years ago so maybe aiohttp should just do that
@graingert I hope you will make aiohttp compatible with 3.11 regardless of whether this fix gets merged into 3.11. As a base dependency of many other packages it would block 3.11 support for those other packages.
@gvanrossum your question has, as always, helped me crystallize the trade off I chose by accident, and some other options that are available to aiohttp and I'm going to try to enumerate them so that aiohttp can make the right choice here
Whoops, reopening since the 3.11 backport didn’t get merged yet.