cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Make get_event_loop() an alias of get_running_loop()

Open serhiy-storchaka opened this issue 2 years ago • 36 comments

Since 3.10 asyncio.get_event_loop() emits a deprecation warning if used outside of the event loop (see #83710). It is a time to turn a warning into error and make asyncio.get_event_loop() an alias of asyncio.get_running_loop().

But maybe we should first deprecate set_event_loop()? It will be a no-op now.

serhiy-storchaka avatar Jun 03 '22 07:06 serhiy-storchaka

I wonder a bit how much existing code will be broken by this change in Python 3.12?

If the deprecation warning in Python 3.10-11 remains unnoticed (like for me, until I ran Python with the -Werror flag), any existing code calling aysncio.get_event_loop() outside a running event loop will break.

A simple workaround that worked in my case is to call asyncio.get_event_loop_policy().get_event_loop() instead. This will return the loop already attached to the current thread, or create a new one if not existing yet. (This seems better than just calling asyncio.new_event_loop() because that creates an extra loop next to already existing loops.)

Will this workaround of using asyncio.get_event_loop_policy().get_event_loop() remain available in Python 3.12+?

lschoe avatar Jun 04 '22 13:06 lschoe

I'm interesting in knowing the answer to @lschoe's question too.

A user of one of my libraries reported this deprecation warning. Is using asyncio.get_event_loop_policy().get_event_loop() a valid workaround?

dantownsend avatar Jun 08 '22 20:06 dantownsend

Ask @asvetlov. I do not know plans about asyncio.get_event_loop_policy().get_event_loop(). It may be deprecated too: you either use the existing running loop, or explicitly create a new one.

asyncio.get_event_loop() was initially proposed to remove, but since it is use so much in the code that predates asyncio.get_running_loop(), it was decided to make it just an alias of asyncio.get_running_loop().

serhiy-storchaka avatar Jun 09 '22 06:06 serhiy-storchaka

Thanks @serhiy-storchaka

dantownsend avatar Jun 09 '22 06:06 dantownsend

Thanks as well. It's good to have something next to asyncio.get_running_loop() because that can be only be called from a running loop;) For lower-level usage of asyncio's event loops, accessing a loop even when it's not running seems still very useful to me. (For higher-level usage stricter, disciplined calls using asyncio.run() etc., not handling e.g. any bare asyncio.Futures is indeed advised.)

For example, I want to call loop.set_exception_handler(handler) to set my own exception handler for the event loop. To do this once and for all, it's convenient to do this before the loop is running. Not sure what the intended way to do this would be, if one can only access the loop after it started running?

lschoe avatar Jun 13 '22 16:06 lschoe

@lschoe I think the intended pattern is

def loop_factory():
    loop = asyncio.new_event_loop()
    loop.set_exception_handler(...)

with asyncio.Runner(loop_factory=loop_factory) as runner:
    runner.run(amain())

or:

with asyncio.Runner() as runner:
    runner.get_loop().set_exception_handler(...)
    runner.run(amain())

But calling asyncio.get_running_loop().set_exception_handler(...) as the first thing in your async def amain(): seems to be the best choice

graingert avatar Jun 16 '22 08:06 graingert

Thanks, these are interesting options, indeed for Python 3.11+ which will have this new Runner class.

Using a loop_factory() does not seem preferable because asyncio.new_event_loop() is still called, which may cause problems. If a loop is already attached to the current thread, this will create a new loop next to the existing one. However, only one of these two loops can run at the same time, so this gives a Runtime error if one tries to run the other one.

With asyncio.get_event_loop() you can join the existing loop, if any. Whether that loop is running or not doesn't matter either.

For example, one can run some code in Python's asyncio REPL (via python -m asyncio) or in Jupyter notebooks, where there is already a loop running providing top-level await, or one can run the same code from a Python script where there is no loop running yet.

I've also tried the second way you propose, calling runner.get_loop(), but this also gives a Runtime error when called from the asyncio REPL, for example.

And your third option requires a running loop again (which will not be there yet, if this code is executed upon importing and initializing a module).

To deal with such varying circumstances, global access to the event loop attached to the current thread via asyncio.get_event_loop() is very useful.

lschoe avatar Jun 16 '22 13:06 lschoe

But maybe we should first deprecate set_event_loop()? It will be a no-op now.

@serhiy-storchaka set_event_loop is not a no-op it sets up the child watcher system https://github.com/python/cpython/issues/93896

graingert avatar Jun 24 '22 08:06 graingert

This is a subtle issue and I would like to go slowly here (but not so slowly to miss the 3.12 feature cut-off!).

I wasn't aware of this issue and need some time to think about the consequences.

The growing asymmetry between get_event_loop() and set_event_loop() is definitely bothering me.

Should we perhaps also deprecate the behavior of Policy.get_event_loop() to sometimes create a new event loop?

gvanrossum avatar Oct 05 '22 15:10 gvanrossum

See also GH-94597.

gvanrossum avatar Oct 06 '22 23:10 gvanrossum

In response to @lschoe's comments (e.g. https://github.com/python/cpython/issues/93453#issuecomment-1157637725), this is indeed a reason to go slowly.

I wonder what the remaining use cases are for accessing the current thread's loop regardless of whether it's running or not -- is it just loop.set_exception_handler(), or are there other configurations as well that some folks prefer to do before the loop is running?

If so, maybe we could recommend loop = asyncio.get_event_loop_policy().get_event_loop(), which has the advantage that it works in older versions too. Maybe we can keep this and policy.new_event_loop() but deprecate set_event_loop_policy() right away? That way eventually the policy is just a singleton that holds some global state in a backwards compatible fashion.

The awkward thing is that the use case requires that we still call set_event_loop(), for the rare cases where someone wants to configure something for the current thread's loop before it is running. How can we convince users not to do that but instead configure the loop once it is running? It does seem a bit clumsy if you want to run the same code in an asyncio-enabled REPL and when using a runner.

I would very much like to believe that this is a very very small minority use case or that it's just a bad practice, but nothing I've read here convinces me of that. @lschoe can you point to real-world code that uses this pattern?

gvanrossum avatar Oct 08 '22 03:10 gvanrossum

the rare cases where someone wants to configure something for the current thread's loop before it is running

This doesn't work when anything uses asyncio.run, because the first thing asyncio.run does is make a brand new event loop and the last thing it does is set the global event loop to None, which makes subsequent get_event_loop calls crash

graingert avatar Oct 08 '22 09:10 graingert

This doesn't work when anything uses asyncio.run, because the first thing asyncio.run does is make a brand new event loop and the last thing it does is set the global event loop to None, which makes subsequent get_event_loop calls crash

Yeah, we need a better understanding of @lschoe's use case.

gvanrossum avatar Oct 09 '22 00:10 gvanrossum

The backdrop for the use case is a bit extensive, so I'll try to extract the relevant points for your discussion as directly as possible.

The perspective is that of a (Python package) developer of code that heavily relies on asyncio and its event loop, using lots of Futures, Tasks, coroutines etc. The use of these asyncio primitives, and Futures in particular, is hidden from the end user, who will experience most code at the application level preferably as non-async code. The actual package MPyC is for secure multiparty computation, a kind of distributed computation with privacy guarantees. The parties are connected by point-to-point TCP/IP links which are used continuously for exchanging secret-shares of (intermediate) values in the computation.

The MPyC code development has a bit of history. It started in Python 2 + twisted, but it's now Python 3 + asyncio since at least five years. The major design decisions pertaining to asyncio's event loop go back to Python 3.6 (currently Python 3.8+ is required).

I'll simply sketch a few reasons for accessing the loop while it's not running.

  1. Cache the loop inside application to save on get_event_loop() calls. The number of calls can run into the millions because Futures are used to handle individual data values, and many micro-Tasks are used as well. The need for caching the loop may be less important since get_event_loop() has been optimized at some point after Python 3.6.

  2. Configure the loop, as part of set-up code that typically runs in non-async mode (e.g., when loading modules). Concretely, setting the loop's exception_handler with an adapted one. By doing this on the cached loop this can be done once and for all.

  3. Use of loop.run_until_complete() from the (non-running) loop. A particular reason is that run_until_complete() accepts a Future as argument and returns the Future's result. This cannot be replaced by a call to asyncio.run(); even if one wraps the given Future in a coroutine, the Future will still be attached to a different loop, and therefore aysncio.run() will fail. Note that this only is an issue if one already created a Future before the loop is running! But see the next reason.

  4. Creating Futures (attached to the not-yet running loop) from non-async code. The Futures are hidden from the user inside lower-level constructs, and serve as placeholders which will be assigned the results of Tasks that have been set out. These Tasks correspond to the evaluation of "MPyC coroutines" of which many will be active at the same time. The non-async code we are calling from include:

    • top-level code in a Python script
    • calls inside unit tests
    • top-level code in REPL (less of an issue because of asyncio REPL for Python 3.8+)
    • top-level code in Jupyter notebooks (no issue anymore since top-level await is standard in notebooks)
  5. Use Futures across multiple loop.run_until_complete() calls. Alternate between non-async code and async code, where Futures carry part of the program's state.

A typical piece of code using mpc.run as a shorthand for loop.run_until_complete looks as follows:

  x = mpc.input(secint(1)) # creates Task that will set Futures inside x (once all values are exchanged)
  s = mpc.sum(x)           # do some computation on x (which is a list with one entry per party)
  y = mpc.output(s)        # creates a Future that will hold the result of the computation
  m = mpc.run(y)           # m becomes a Python int, equal to the number of parties

Only at the last line we need to wait for the result, and if the above code is part of async code we would use m = await y instead.

Summarizing, we are using asyncio's event loop freely, without much concern if the loop happens to be running or not. The main goal being to accommodate mixes of non-async code and async code, and viewing the loop attached to a thread as a unique piece of real estate to do so.

lschoe avatar Oct 10 '22 08:10 lschoe

Hm... If you are okay with delegating creation of the loop to something else, may I recommend using the Runner class added to 3.11? You should be able to do something like

r = asyncio.Runner()
loop = r.get_loop()
loop.set_exception_handler(exc_handler)

Now you can run either futures or coroutines:

loop.run_until_complete(fut)

or

r.run(coro(arg, ...))

The Runner instance allows reusing the loop as many times as you want. When done, use

r.close()

You can hide all this detail inside your own abstraction. There's no need to use with Runner() as r: ....

gvanrossum avatar Oct 10 '22 20:10 gvanrossum

Right, thanks. The Runner class was also suggested by @graingert above. Indeed, this should give enough flexibility to set an exception handler and more generally one gets direct access to the underlying loop.

What I overlooked previously is that--unlike asyncio.run()--the loop is kept alive after r.run() is done. It is not closed until r.close() is called as you point out. This may give just enough flexibility to create and carry Futures between subsequent runs of the event loop.

I'll try and see how things work out.

lschoe avatar Oct 11 '22 07:10 lschoe

... trying this out was effortless. Double checked things, and turns out that replacing

self._loop = asyncio.get_event_loop()  # cache event loop

with

self._loop = asyncio.Runner().get_loop()  # cache event loop

does the job! Everything runs as before, and because the loop was already cached, Futures attached to it persist in between subsequent runs.

One small issue though: if there's a loop already running, asyncio.Runner() will still create a new loop. Would it also be possible that the existing loop is used, if any (because there's supposed to a unique loop per thread anyway)?

lschoe avatar Oct 11 '22 07:10 lschoe

You probably want to copy the asyncio.mixins._LoopBoundMixin class instead

graingert avatar Oct 11 '22 09:10 graingert

Nice, thx, that makes it work combined like this:

self._loop = asyncio.mixins._LoopBoundMixin()._get_loop()
if self._loop is None:
    self._loop = asyncio.Runner().get_loop()

Now it also works when a loop is already running (like in the asyncio REPL, and probably in a Jupyter notebook as well but I didn't test that for 3.11rc2).

lschoe avatar Oct 11 '22 10:10 lschoe

No not like that, you need to copy the _LoopBoundMixin code and subclass it and call self._get_loop().create_future() only when the loop is running

graingert avatar Oct 11 '22 10:10 graingert

But I actually want to create Futures attached to a (the?) not-yet running loop. At least that's what I've been doing so.

lschoe avatar Oct 11 '22 10:10 lschoe

@lschoe in that case you probably want to copy the Future code and make a version that attaches the loop with the LoopBoundMixin only when a result is set. Asyncio supports duck type futures, you're not limited to asyncio.Future

graingert avatar Oct 11 '22 11:10 graingert

I would prefer to avoid that kind of approach. But currently in 3.11 things like loop = asyncio.Runner().get_loop(); asyncio.Future(loop=loop), where loop is not running work without (deprecation) warning. So, isn't it simply allowed to attach Futures to a non-running loop?

lschoe avatar Oct 11 '22 16:10 lschoe

Here's a possible solution if you want to use a Runner with a pre-existing loop. You can call Runner(loop_factory=self.get_running_loop) in that case -- this returns a Runner that gets the existing loop. However when you close that Runner it will close the loop. So just don't call close() (and don't use it as a context manager).

It would however be better if you could refactor your code so that it only uses the Runner when there is no pre-existing loop. Since you seem to have your own run() function anyway that shouldn't be hard. Just distinguish the two cases somehow when you first initialize.

gvanrossum avatar Oct 11 '22 17:10 gvanrossum

An issue with refactoring along these lines is that there's also the other case that run() is not called at all. Namely, if top-level await is supported like in the asyncio REPL, ipython, and Jupyter notebooks. In that case we want to attach to a preexisting loop as well. A call like asyncio.get_event_loop() currently does the job for this case as well, so the goal is to find alternatives for this solution.

lschoe avatar Oct 12 '22 07:10 lschoe

Can you please enumerate and explain your different use cases again, with examples? I feel we're going in circles trying to understand the requirements here.

gvanrossum avatar Oct 12 '22 17:10 gvanrossum

There's a question in an old Discourse thread that seems related: https://discuss.python.org/t/supporting-asyncio-get-event-loop-run-until-complete-in-repls/5573/10

I realize I don't actually know enough about Jupyter's asyncio loop to be helpful there. Maybe one of the participants here can help?

gvanrossum avatar Oct 12 '22 23:10 gvanrossum

Yeah, thx. I've gone through a lot of such pages in the past, looking for ways and scenarios combining async and non-async code; with or without having a running or non-running loop around (in python asyncio REPL, Jupyter, ipython).

To maneuver successfully towards solutions, it helps a lot to have access to the loop (if any) and see if it's running or not. That's the question at the top of this issue, whether such access to the loop will remain.

For the case that one wants to call loop.run_until_complete() from a running loop, like in the old Discourse thread above, I'm using a solution that works for a limited type of coroutines, namely where all awaits are used in combination with Futures. But that does not cover the latest case in the Discourse thread because that has a call asyncio.sleep(0.1).

lschoe avatar Oct 13 '22 16:10 lschoe

I have a small breakthrough to report. Please read https://discuss.python.org/t/supporting-asyncio-get-event-loop-run-until-complete-in-repls/5573/13.

TL;DR: I now understand why calling run_until_complete() recursively on the same loop is a disaster, and I am proposing to allow running events in a new loop by passing a flag. ~@1st1 has already approved of such a design.~ (UPDATE: It appears @1st1 was actually thinking of the "same loop" scenario, see GH-77704.)

We could go further and fix _run_once() so that it doesn't crash when it is called recursively, and then we could even allow run_until_complete() recursively on the same loop, but I am less sure of that -- I'm not certain that there aren't other consequences, for example on the I/O multiplexing code.

Meanwhile for Python versions through 3.11, the only solution is to run the new loop in a new thread. Not terrible, you can almost do it in one line of code:

from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor()

result = executor.submit(asyncio.run, coro()).result()

In the future we ought to be able to write this instead (PR welcome):

result = asyncio.run(coro(), allow_recursive_loop=True)

gvanrossum avatar Oct 14 '22 14:10 gvanrossum

FWIW a fix for the _run_once() issue was previously proposed by Instagram: https://bugs.python.org/issue33523. It has a PR.

gvanrossum avatar Oct 14 '22 18:10 gvanrossum