cpython
cpython copied to clipboard
Add eager evaluation API to TaskGroups
Feature or enhancement
We propose adding “eager” coroutine execution support to asyncio.TaskGroup via a new method enqueue() [1].
TaskGroup.enqueue() would have the same signature as TaskGroup.create_task() but eagerly perform the first step of the passed coroutine’s execution immediately. If the coroutine completes without yielding, the result of enqueue() would be an object which behaves like a completed asyncio.Task. Otherwise, enqueue() behaves the same as TaskGroup.create_task(), returning a pending asyncio.Task.
The reason for a new method, rather than changing the implementation of TaskGroup.create_task() is this new method introduces a small semantic difference. For example in:
async def coro(): ...
async with TaskGroup() as tg:
tg.enqueue(coro())
raise Exception
The exception will cancel everthing scheduled in tg, but if some or all of coro() completes eagerly any side-effects of this will be observable in further execution. If tg.create_task() is used instead no part of coro() will be executed.
Pitch
At Instagram we’ve observed ~70% of coroutine instances passed to asyncio.gather() can run fully synchronously i.e. without performing any I/O which would suspend execution. This typically happens when there is a local cache which can elide actual I/O. We exploit this in Cinder with a modified asyncio.gather() that eagerly executes coroutine args and skips scheduling a asyncio.Task object to an event loop if no yield occurs. Overall this optimization saved ~4% CPU on our Django webservers.
In a prototype implementation of this proposed feature [2] the overhead when scheduling TaskGroups with all fully-synchronous coroutines was decreased by ~8x. When scheduling a mixture of synchronous and asynchronous coroutines, performance is improved by ~1.4x, and when no coroutines can complete synchronously there is still a small improvement.
We anticipate code relying on any semantics which change between TaskGroup.create_task() and TaskGroup.enqueue() will be rare. So, as the TaskGroup interface is new in 3.11, we hope enqueue() and its performance benefits can be promoted as the preferred method for scheduling coroutines in 3.12+.
Previous discussion
This new API was discussed informally at PyCon 2022, with at least some of this being between @gvanrossum, @DinoV, @markshannon, and /or @jbower-fb.
[1] The name "enqueue" came out of a discussion between @gvanrossum and @DinoV.
[2] Prototype implementation (some features missing, e.g. specifying Context), and benchmark.
If
condisTrueand some or all ofcoro()completes eagerly, any side-effects of this will be observable in further execution. Iftg.create_task()is used instead, no part ofcoro()will be executed ifcondisTrue.
If I didn't already understand the changes to async functions you are proposing I wouldn't understand what this means (or the code fragment above it). What does cond refer to? I think what you are trying to say here is that, if the body of the with TaskGroup() block raises before the first await, it will cancel all created tasks before they have started executing. Right?
Separately -- Can you remind me of the reason to make this a TaskGroup method instead of a new flag on create_task()? And the enqueue() name feels odd, especially since it may immediately execute some of the coroutine, which seems the opposite of putting something in a queue.
To be clear, I have no objection to the feature, and I think making this an opt-in part of task creation avoids worries of changing semantics for something as simple as await coro().
Separately -- Can you remind me of the reason to make this a TaskGroup method instead of a new flag on create_task()? And the enqueue() name feels odd, especially since it may immediately execute some of the coroutine, which seems the opposite of putting something in a queue.
This came up in the conversation we had with @DinoV during the Language Summit. We started out with a flag on create_task (e.g. eager=True), but we didn't like the fact that when setting this flag, the name of the method may now be misleading (because, in fact, it may not create a task).
enqueue was the alternative (I don't remember who suggested it and why), but we can definitely bikeshed the naming :)
If we take inspiration from Trio's nurseries, we can use start_soon.
Other options are execute or run (although these names may be misleading too, since users may assume these APIs never create a task).
If I didn't already understand the changes to async functions you are proposing I wouldn't understand what this means (or the code fragment above it). What does cond refer to? I think what you are trying to say here is that, if the body of the with TaskGroup() block raises before the first await, it will cancel all created tasks before they have started executing. Right?
Right. Yeah, this is more contrived than needed, I'll fix up the text.
This came up in the conversation we had with @DinoV during the Language Summit. We started out with a flag on create_task (e.g. eager=True), but we didn't like the fact that when setting this flag, the name of the method may now be misleading (because, in fact, it may not create a task).
I wasn't in this conversation but when writing this issue it occurred to me that even with eager execution we could still return something Task-like. This would mitigate any issues with knowing the interface of the returned object (e.g. is cancel() available). This would also mean the name create_task() would still make sense even with a keyword-argument to alter the behavior. Unless, I'm missing something which doesn't make this viable?
Having said this, I think eager execution should be the default behavior users reach for going forward. So it might still be better to have a new method.
Thanks for fixing up the description, it's clear now.
I wasn't in this conversation but when writing this issue it occurred to me that even with eager execution we could still return something
Task-like. This would mitigate any issues with knowing the interface of the returned object (e.g. iscancel()available). This would also mean the namecreate_task()would still make sense even with a keyword-argument to alter the behavior. Unless, I'm missing something which doesn't make this viable?
It could return a Future, which has many of the same methods. If the coroutine returns an immediate result that could be the Future's result.
However, if the result is generally not used though the need to return something with appropriate methods would just reduce the performance benefit.
Having said this, I think eager execution should be the default behavior users reach for going forward. So it might still be better to have a new method.
Yeah, that is definitely an advantage of a new method name. I find enqueue() rather easy to mistype though.
Have the performance results been repeated with TaskGroup.enqueue(), or are the quoted measurements based on the Cinder implementation using gather()? There could be surprises here.
Have the performance results been repeated with TaskGroup.enqueue(), or are the quoted measurements based on the Cinder implementation using gather()? There could be surprises here.
The benchmark showing an 8x improvement (etc.) is with Dino's prototype of TaskGroup.enqueue() against some revision of 3.11. This is functional, but still needs some work. Notably it does not yet implement management of the current Context when eagerly executing. There will be some extra overhead from that but my hope is it'll be negligible.
It could return a Future, which has many of the same methods. If the coroutine returns an immediate result that could be the Future's result.
However, if the result is generally not used though the need to return something with appropriate methods would just reduce the performance benefit.
Dino's prototype currently returns a newly constructed Future if the eager execution fully completes, but returns a Task otherwise. While I can imagine it'll be less common to use the extra fields on Task, it might be annoying to have to think about whether a Future or a Task is returned. Again, unless I missed something, I don't think there should be extra overhead from returning some kind of eagerly-completed-Task value vs. a completed Future.
I find enqueue() rather easy to mistype though.
Indeed, I have misspelled it a number of times already. I like start_soon or start_immediate which to me imply execution start time may not be bound by the async with scope.
For a new API, returning either a Future or a Task is totally fine -- Task is a subclass of Future, so we can just document it as returning a Future.
I hadn't realized the prototype doesn't even need C changes -- I had assumed it depended on the changes to vectorcall to pass through a bit indicating it's being awaited immediately. Is that change even needed once we have this? (Probably we resolved that already during discussions at PyCon, it's just so long ago I can't remember anything.)
I think we need to bikeshed some more on the name, but you're right about the impression the name ought to give. Maybe start_task()?
We can just document it as returning a Future.
I think it'd be better to document it as returning a Task? That way it's clear cancellation is still generally an option.
I had assumed it depended on the changes to vectorcall to pass through a bit indicating it's being awaited immediately. Is that change even needed once we have this? (Probably we resolved that already during discussions at PyCon, it's just so long ago I can't remember anything.)
That is an interesting question, and again I don't think I was around if that was discussed in detail. For this feature, that flag is not needed. If we want to add an optimization for the large body of existing code that uses asyncio.gather(), then it would help significantly. We are probably going to carry this feature forward in Cinder for a while at least as it'll likely take quite some time (years?) before we can fully migrate to the new TaskGroup API. I'd definitely like to know what your, or other members of the community feel about adding an optimization to help with the older/existing API.
Maybe start_task()?
Fine with me. Maybe we should discuss further once a PR is up.
I think it'd be better to document it as returning a Task? That way it's clear cancellation is still generally an option.
From a static type POV it's always a Future, not always a Task. If you want to cancel it you take your chances -- cancel() will return a bool telling you whether it worked. The docs should just explain the situation without lying.
I will wait for the PR (that you're hopefully working on?). I recommend using start_task() as the method name until we come up with something better.
From a static type POV it's always a Future, not always a Task.
Whoops, I just realized Future has a cancel() method. I mistakenly thought that was a feature of Task. Sorry for the confusion.
I will wait for the PR (that you're hopefully working on?)
We should have something up soon. Really wanted to get an issue open first for early feedback on the plan.
We had a fruitful discussion on this topic at the core dev sprint. @1st1 and @carljm were present (and others).
Yury recommends not adding a new API like proposed here, but instead solving this globally (per event loop). We can have an optional task factory that people can install using loop.set_task_factory(), and which does effectively what is being proposed here for all create_task() calls (not just for the TaskGroup method).
The new create_task() would do what you propose here (call coro.send(None) and analyze the outcome), and either create a Future and set its result (or exception), or create a Task from a coroutine and a yield_result. The latter operation could be done by an extension to the create_task() API and the Task constructor API -- if an optional keyword parameter yield_result=result is passed, the task constructor doesn't end by calling loop.call_soon(self.__step, context=self._context), but instead treats result as is done by __step, interpreting it as a future to be blocked on.
This requires an extension of the create_task() API, but that should be okay -- if you are using a loop whose create_task() doesn't yet support the yield_result keyword parameter, you just can't benefit from the new functionality yet. (I'm guessing the loop should advertise whether it supports this somehow.)
I'm working on a prototype implementation in #98137.
I'm hoping that one of the Instagram folks can work on this contribution? I don't have the bandwidth to finish up that PR -- it was just meant as a quick proof of concept of what we discussed at the sprint.
Yep, we'll pick that up! Just need to find a moment to look in more detail.
@jbower-fb Have you ever found the time to look into this more?
@gvanrossum I have not but I think @itamaro is going to pick this up. Sorry for the confusion, it was a bit unclear last year who was going to have time to do what.
Cool. Do I need to keep this PR open?
Cool. Do I need to keep this PR open?
already pulled that PR locally, so you can close it!
probably going to take me a bit to learn enough asyncio internals to get this working and tested
Let me know (e.g. by posting questions here) if you need any help!
GH-101613 has a working prototype. still much left to do. TODO items and additional discussion items on the draft PR (let me know if you prefer to keep the discussion on the issue).
isn't this going to cause issues with structured concurrency objects (those that use asyncio.current_task(), eg timeout and TaskGroup)?
eg in
async def some_task():
async with asyncio.timeout(1):
await asyncio.Event().wait() # never gets cancelled
here asyncio.timeout grabs the asyncio.current_task(), which would actually be some previous task
We anticipate code relying on any semantics which change between TaskGroup.create_task() and TaskGroup.enqueue() will be rare
opening a timeout or TaskGroup in the first step of an async function doesn't seem like a rare occurrence
isn't this going to cause issues with structured concurrency objects (those that use
asyncio.current_task(), eg timeout and TaskGroup)?
yes, this is a real problem.
one possible solution is to modify asyncio.current_task to materialize a task on-demand for the current coroutine if it detects it's being called in the context of an eagerly executing coroutine (would require the eager task factory to set something on the loop to communicate this to current_task).
would that resolve the issue and be an acceptable solution?
@itamaro Is that how you resolved it?
Hello, I'm back and have been assisting @itamaro with this issue. In the latest PR update I believe we have this addressed with minimal extra overhead.
The crux of the change is to always create a Task, even when eagerly executing. This means there's a Task available to swap in as the "current" task during eager execution.
In theory this shouldn't have been much more expensive as eager execution was always creating either a Task or a Future anyway. In practice, a big difference was Tasks were registered in the _all_tasks weak-set which is quite expensive. I've mitigated this by splitting _all_tasks into _scheduled_tasks which is still a weak-set, and a much cheaper regular set with only_eager_tasks. We don't need a weak-set for eager tasks as we fully control life-time during eager execution, and if an await does happen we can graduate to a "scheduled task". The all_tasks() now pulls from both sets so users still have a unified view.
To make the above work I've pushed most of the functionality that was in create_eager_task_factory.factory() into Task. Firstly, this allows us to set the result/exception of a completed eager execution so there's no longer a need to make a Future. We couldn't easily do this from outside the class as set_exception/result() are disallowed. I think this also makes more sense as a Task now holds the logic both to schedule itself to the loop or eagerly execute itself.
As well the issue above I fixed some other issues too:
all_tasks()had a similar issue tocurrent_task()and is now implicitly fixed.- When eagerly executing we weren't entering into the
Tasks context, meaning context-var changes would incorrectly affect the parent task. - Certain exceptions during task execution like
KeyboardInterruptandCancelledErrorneed special handling. This is true even for eager execution and was missing.
I expect/hope the full Pyperformance results to be about the same as before, but we're still getting that data.
I also want to think some more about testing for some of the other issues resolved here, but wanted to get the new implementation up now for public comment.
@jbower-fb provided the details :) I'm running pyperformance - should have results tomorrow
One other curiosity worth mentioning: with the async_tree benchmark I found eager execution lead to a lot more GC overhead (according to Linux perf). I cut this down by setting Task._coro to None if eager execution completes the task. I don't deeply understand what's going on here though.
Clearing the coroutine on any Tasks completion may help a bit anyway. It is a semantic change though - if you queryd a Task for its coroutine after it's completed you'd go from getting one to None . I guess nothing should really be relying on this? But maybe I'm missing something so I've restricted this change to eager Tasks only.
I commented on the PR - benchmarks results are the same as the original implementation!
There are two PRs referenced by the issue, which one is most update? Should the other one be closed now?
There are two PRs referenced by the issue, which one is most update? Should the other one be closed now?
I closed the draft PR, thanks for noticing!
See my comments on the PR, reopening.
One other curiosity worth mentioning: with the async_tree benchmark I found eager execution lead to a lot more GC overhead (according to Linux perf). I cut this down by setting Task._coro to None if eager execution completes the task. I don't deeply understand what's going on here though. Clearing the coroutine on any Tasks completion may help a bit anyway. It is a semantic change though - if you queryd a Task for its coroutine after it's completed you'd go from getting one to None . I guess nothing should really be relying on this? But maybe I'm missing something so I've restricted this change to eager Tasks only.
I don't see this change documented anywhere and I am not sure anyone has agreed that this is correct. I am -1 on doing this change, it breaks introspection, if there is a performance bottleneck then it needs to be investigated rather than just being masked away by unrelated change in semantics.