cpython `IsolatedAsyncioTestCase` does not call `asyncio.set_event_loop` before `setUp` anymore, `asyncio.Runner`+`PidFdChildWatcher` leaves zombie processes

demo reproducer:

import unittest
import asyncio

class DemoTestCase1(unittest.IsolatedAsyncioTestCase):
    def setUp(self):
        self.loop = asyncio.get_event_loop_policy().get_event_loop()

    async def test_demo(self):
        print("hello!")


class DemoTestCase2(unittest.IsolatedAsyncioTestCase):
    def setUp(self):
        self.loop = asyncio.get_event_loop_policy().get_event_loop()

    async def test_demo(self):
        print("hello!")

if __name__ == "__main__":
    unittest.main()

see also https://github.com/python/cpython/issues/93896

 graingert@conscientious  testing310  ~/projects  python3.10 foo.py 
hello!
.hello!
.
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

 graingert@conscientious  testing310  ~/projects  python3.11 foo.py 
hello!
.E
======================================================================
ERROR: test_demo (__main__.DemoTestCase2.test_demo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.11/unittest/async_case.py", line 82, in _callSetUp
    self._asyncioTestContext.run(self.setUp)
  File "/home/graingert/projects/foo.py", line 14, in setUp
    self.loop = asyncio.get_event_loop_policy().get_event_loop()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/events.py", line 677, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'MainThread'.

----------------------------------------------------------------------
Ran 2 tests in 0.002s

FAILED (errors=1)

it looks like this is because asyncio.Runner calls asyncio.set_event_loop( in Runner.run

Aug 06 '22 08:08 graingert

it looks like asyncio.set_event_loop is also being called an additional time per call to asyncio.Runner.run which will call get_child_watcher().attach_loop(loop) every call which causes the PidfdChildWatcher to blow away its callbacks

Aug 06 '22 09:08 graingert

yep that's exactly what's happening here:

import asyncio
import sys

asyncio.set_child_watcher(asyncio.PidfdChildWatcher())

tasks = set()
pids = set()


async def subprocess():
    proc = await asyncio.create_subprocess_exec(
        sys.executable,
        "-c",
        "import time; print('hello!'); import time; time.sleep(2); print('goodbye')",
    )
    pids.add(proc.pid)
    print(await proc.communicate())


async def main():
    task = asyncio.create_task(subprocess())
    tasks.add(task)
    await asyncio.sleep(1)


async def main2():
    try:
        async with asyncio.timeout(3):
           await asyncio.gather(*tasks)
    except TimeoutError:
        import psutil
        for pid in pids:
            print(psutil.Process(pid))
    else:
        print("success!")

if __name__ == "__main__":
    with asyncio.Runner() as runner:
        runner.run(main())
        runner.run(main2())

output

hello!
goodbye
psutil.Process(pid=406749, name='python', status='zombie', started='10:13:40')

Aug 06 '22 09:08 graingert

https://github.com/python/cpython/issues/95584 is vaguely related

Aug 06 '22 09:08 graingert

The issue here is that asyncio.Runner calls set_event_loop too late and too often

In addition child watchers that use attach_loop are prone to unusual behaviour when the main thread switches loop

Aug 08 '22 07:08 graingert

It looks like IsolatedAsyncioTestCase when the Pidfdchildwatcher is used will break a subprocess started in asyncSetUp

Aug 10 '22 21:08 graingert

import asyncio
import sys
import unittest

import asyncio

asyncio.set_child_watcher(asyncio.PidfdChildWatcher())


def create_free_port():
    return 4  # chosen by a fair dice roll


class TestProc(unittest.IsolatedAsyncioTestCase):
    async def asyncSetUp(self):
        self.port = create_free_port()
        self.proc = await asyncio.create_subprocess_exec(
            sys.executable,
            "-c",  # more realistically this might be an http server or database
            f"import time; print('listening on {self.port}'); import time; time.sleep(2); print('goodbye')",
        )

    async def testProc(self):
        print(f"connecting to {self.port}")

    async def asyncTearDown(self):
        await self.proc.communicate()  # hangs forever on 3.11


if __name__ == "__main__":
    unittest.main()

on python3.10 this produces:

connecting to 4
listening on 4
goodbye
.
----------------------------------------------------------------------
Ran 1 test in 2.022s

OK

on python3.11 it hangs forever in await self.proc.communicate()

Aug 11 '22 09:08 graingert

I think IsolatedAsyncioTestCase can be fixed by backing out the changes to use asyncio.Runner for now in 3.11 and then in 3.12, copy out all of unittest's TestCase verbatim, upgrade all the methods to async def, then use asyncio.run(self.run()) rather than trying to go back in and out of the the asyncio eventloop.

this would also allow people to use yield-safe dependent contextmanagers like asyncio.timeout and asyncio.TaskGroup in the enterAsyncContext() method

Aug 11 '22 10:08 graingert

Please create a separate issue for the PidfdChildWatcher issue to avoid confusion (done https://github.com/python/cpython/issues/95899) Will take a look on this now.

Aug 11 '22 17:08 kumaraditya303

I think IsolatedAsyncioTestCase can be fixed by backing out the changes to use asyncio.Runner for now in 3.11 and then in 3.12, copy out all of unittest's TestCase verbatim, upgrade all the methods to async def, then use asyncio.run(self.run()) rather than trying to go back in and out of the the asyncio eventloop.

I'm not sure I follow. Are you saying that IsolatedAsyncioTestCase should just be a textual copy of TestCase with some small changes (def -> async def)? That seems brittle -- if anything gets added to TestCase we'd have to remember to repeat the change in IsolatedAsyncioTestCase.

this would also allow people to use yield-safe dependent contextmanagers like asyncio.timeout and asyncio.TaskGroup in the enterAsyncContext() method

What does "yield-safe dependent" mean? I'm guessing that ATM self.enterAsyncContext(timeout(2)) doesn't work? Why not? What goes wrong? (Asking you before I go on a wild goose chase trying to construct a test case.)

Aug 13 '22 00:08 gvanrossum

self.enterAsyncContext(timeout(2)) enter and exit gets run in two different tasks, and it always suppresses the exception thrown into it.

There's more to yield safety than just being in the same task and propagating exceptions, @njsmith has a draft PEP

Aug 13 '22 01:08 graingert

That seems brittle -- if anything gets added to TestCase we'd have to remember to repeat the change in IsolatedAsyncioTestCase.

If this was something you wanted me to work on, I'd implement the same testing strategy as AsyncExitStack where a SyncExitStack is built out of AsyncExitStack and run through the same test suite as the real ExitStack see https://github.com/python/cpython/issues/95902 for a plan on how to fix that same issue with asynccontextmanager and contextmanager

Aug 13 '22 01:08 graingert

Okay let's back out of the speculative projects and focus on this PR. ATM I have no idea how the one line Kumar added fixes the test, and it feels too much like "programming by random modification" to me to just approve because it makes the test pass. :-) I will get back to this.

Aug 13 '22 03:08 gvanrossum

Speculative projects moved here

https://discuss.python.org/t/an-async-testcase-with-support-for-structured-concurrency-context-managers/18262

Aug 16 '22 13:08 graingert

The changeset breaks wasm32-emscripten tests. The issue is TestSendfile in test_os. The entire test class is skipped because wasm32-emscripten does not have os.sendfile.

test_flags (test.test_os.TestSendfile.test_flags) ... skipped 'test needs os.sendfile()'
test_headers (test.test_os.TestSendfile.test_headers) ... skipped 'test needs os.sendfile()'
test_headers_overflow_32bits (test.test_os.TestSendfile.test_headers_overflow_32bits) ... skipped 'test needs os.sendfile()'
test_invalid_offset (test.test_os.TestSendfile.test_invalid_offset) ... skipped 'test needs os.sendfile()'
test_keywords (test.test_os.TestSendfile.test_keywords) ... skipped 'test needs os.sendfile()'
test_offset_overflow (test.test_os.TestSendfile.test_offset_overflow) ... skipped 'test needs os.sendfile()'
test_send_at_certain_offset (test.test_os.TestSendfile.test_send_at_certain_offset) ... skipped 'test needs os.sendfile()'
test_send_whole_file (test.test_os.TestSendfile.test_send_whole_file) ... skipped 'test needs os.sendfile()'
test_trailers (test.test_os.TestSendfile.test_trailers) ... skipped 'test needs os.sendfile()'
test_trailers_overflow_32bits (test.test_os.TestSendfile.test_trailers_overflow_32bits) ... skipped 'test needs os.sendfile()'

Despite the class level skip, the test runner runs some setup routine. The recent change caused the async test runner to set up an event loop. This does not work on wasm32-emscripten. There is no socketpair() implementation and you cannot create a listening socket.

Here is a traceback of a wasm32-emscripten run with an extra ``return NULL;insocket.listen()`. As you can see it is trying to set up the event loop although the entire test class is skipped.

test test_os crashed -- Traceback (most recent call last):
  File "/home/heimes/dev/python/cpython/Lib/test/libregrtest/runtest.py", line 360, in _runtest_inner
    refleak = _runtest_inner2(ns, test_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/test/libregrtest/runtest.py", line 317, in _runtest_inner2
    test_runner()
  File "/home/heimes/dev/python/cpython/Lib/test/libregrtest/runtest.py", line 281, in _test_module
    support.run_unittest(tests)
  File "/home/heimes/dev/python/cpython/Lib/test/support/__init__.py", line 1216, in run_unittest
    _run_suite(suite)
  File "/home/heimes/dev/python/cpython/Lib/test/support/__init__.py", line 1090, in _run_suite
    result = runner.run(suite)
             ^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/unittest/runner.py", line 208, in run
    test(result)
  File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 122, in run
    test(result)
  File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 122, in run
    test(result)
  File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/unittest/suite.py", line 122, in run
    test(result)
  File "/home/heimes/dev/python/cpython/Lib/unittest/case.py", line 678, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/unittest/async_case.py", line 129, in run
    self._setupAsyncioRunner()
  File "/home/heimes/dev/python/cpython/Lib/unittest/async_case.py", line 122, in _setupAsyncioRunner
    runner.get_loop()
  File "/home/heimes/dev/python/cpython/Lib/asyncio/runners.py", line 82, in get_loop
    self._lazy_init()
  File "/home/heimes/dev/python/cpython/Lib/asyncio/runners.py", line 136, in _lazy_init
    self._loop = events.new_event_loop()
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/asyncio/events.py", line 805, in new_event_loop
    return get_event_loop_policy().new_event_loop()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/asyncio/events.py", line 695, in new_event_loop
    return self._loop_factory()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/asyncio/unix_events.py", line 64, in __init__
    super().__init__(selector)
  File "/home/heimes/dev/python/cpython/Lib/asyncio/selector_events.py", line 56, in __init__
    self._make_self_pipe()
  File "/home/heimes/dev/python/cpython/Lib/asyncio/selector_events.py", line 107, in _make_self_pipe
    self._ssock, self._csock = socket.socketpair()
                               ^^^^^^^^^^^^^^^^^^^
  File "/home/heimes/dev/python/cpython/Lib/socket.py", line 634, in socketpair
    lsock.listen()
SystemError: <method 'listen' of '_socket.socket' objects> returned NULL without setting an exception

Aug 16 '22 17:08 tiran

I propose to roll back the PR (https://github.com/python/cpython/pull/95898) since I have no idea how to deal with this.

Aug 16 '22 17:08 gvanrossum

FWIW the problem is that _setupAsyncioRunner() is called even when the test is supposed to be skipped.

I suppose a fix would be to test for the skip flags (see TestCase.run()) before calling it.

Aug 16 '22 17:08 gvanrossum

While the demo no longer fails, we need to keep this open until #96033 lands. Given all the complexity this issue has already experienced I don't think we can expect to backport this to 3.11. I guess we'll have to recommend that affected user tests in 3.11 can call self._asyncioRunner.get_loop() in setUp() -- not ideal because of the protected variable but okay if combined with a check for 3.11. We might backport to 3.11.1, once 3.11.0 has been released.

Aug 17 '22 00:08 gvanrossum

I guess we'll have to recommend that affected user tests in 3.11 can call self._asyncioRunner.get_loop() in setUp()

The fix I'm thinking of for aiohttp is to move to asyncio.get_running_loop() in the asyncSetUp

https://github.com/aio-libs/aiohttp/pull/6877/files#diff-70599d14cae2351e35e46867bce26e325e84f3b84ce218718239c4bfeac4dcf5R437

Aug 17 '22 01:08 graingert

I guess we'll have to recommend that affected user tests in 3.11 can call self._asyncioRunner.get_loop() in setUp()

The fix I'm thinking of for aiohttp is to move to asyncio.get_running_loop() in the asyncSetUp

https://github.com/aio-libs/aiohttp/pull/6877/files#diff-70599d14cae2351e35e46867bce26e325e84f3b84ce218718239c4bfeac4dcf5R437

It looks like you're also moving from setUp to asyncSetUp -- doesn't the latter always run after the runner has been initialized?

Aug 17 '22 01:08 gvanrossum

Yes, I chose to do this because it's in an asyncio async function so it's guaranteed the loop is available. Here aiohttp is effectively passing on the bug to it's downstream users.

Therefore maybe what aiohttp should do is:

not land this workaround
wait for the patch in CPython
deprecate this TestCase.loop property and tell people to always use get_running_loop themselves

I think it was also discovered that this whole aiohttp TestCase class was possibly supposed to have been deprecated years ago so maybe aiohttp should just do that

Aug 17 '22 08:08 graingert

@graingert I hope you will make aiohttp compatible with 3.11 regardless of whether this fix gets merged into 3.11. As a base dependency of many other packages it would block 3.11 support for those other packages.

Aug 17 '22 14:08 gvanrossum

@gvanrossum your question has, as always, helped me crystallize the trade off I chose by accident, and some other options that are available to aiohttp and I'm going to try to enumerate them so that aiohttp can make the right choice here

Aug 17 '22 14:08 graingert

Whoops, reopening since the 3.11 backport didn’t get merged yet.

Aug 18 '22 04:08 gvanrossum