Feature: new asyncio executor
Motivation
While rclpy is task-based and built with asynchronous support, its custom implementation imposes significant limitations and lacks integration with Python's asyncio ecosystem. Integrating rclpy nodes with modern asyncio-based Python libraries like FastAPI and pyserial-asyncio is difficult and often forces developers into complex multi-threaded solutions.
Inspired by @sloretz's PR #971, this PR introduces an asyncio-based executor that runs nodes entirely on the asyncio event loop, which has become the de facto standard for IO programming in the Python community.
Design considerations
C++ vs. Python Implementation
- The existing EventsExecutor is fully implemented in C++, duplicating a large percentage of the existing executor logic. Almost every function ends up calling back into Python objects, and core functionality like logging and exception handling is done by running dynamic python commands with
py::exec. In addition, casting the EventsExecutor type to Executor feels to me like unhealthy practice. - Since these methods (aside from the queue and timer manager) only run a few times over the executor’s lifetime, the performance benefit of C++ is minimal.
- A pure Python implementation greatly simplifies integration with asyncio and lets us share code with the standard WaitSet executor, avoiding the duplicated logic that lives in C++ today.
- I did evaluate a hybrid approach (wrapping logic in Python, core loop in C++), but it introduced complex multiple inheritance and cross-language calls that made the code far harder to read and maintain.
Callback Handling
- Both rclcpp’s and rclpy’s EventsExecutors try to keep the in-middleware callback as atomic as possible (acquire a lock and push to a queue) to support real-time determinism in C++.
- In rclpy, however, callbacks already run in Python—so they’re never truly real time, and the interpreter itself is the main bottleneck.
- After weighing options, I chose to allow the RCL callback to acquire the GIL and invoke an atomic Python function:
asyncio.call_soon_threadsafe. Since most of asyncio’s core is in C, this amounts to grabbing a lock, enqueueing the task, and writing to a wake-fd, which is an extremely lightweight operation. - I considered introducing a middle-man thread or a C++ queue with a custom wake-fd for asyncio, but these approaches either added unnecessary threads that had little performance benefit or weren’t fully cross-platform (each OS needs its own socket approach).
Futures Compatibility
- rclpy.Future cannot be awaited by asyncio tasks due to missing
get_loop()and_asyncio_future_blockingapi. - In asyncio, unlike rclpy, a cancelled Future is also considered “done”.
- Asyncio futures must belong to the running loop, and can only be created by an existing loop using
loop.create_future(). Asyncio even enforces in runtime that the future belongs to the running loop. In contrast,rclpylets you callclient.call_async()without an executor, which is only set when the response arrives.
Spin Behavior
- The WaitSet executor’s
spin_once()executes only one callback per invocation. asyncio’sloop.run_forever()repeatedly calls_run_once()until stopped, executing all ready callbacks each cycle.- To match the behavior of spin_once, an
asyncio.Queueis utilized to queue entity callbacks and user created tasks for spin_once. Users might choose to create a task using theexecutor.create_taskapi or theexecutor.loop.create_taskapi. The first will match the behavior of spin_once, while the second will be more efficient.
Changes
- Added an experimental
AsyncioExecutorclass that runs entity events as asyncio tasks on the event loop. - Exposed the
set_on_new_<message,request,response>_callbackAPI in Python Subscription, Service, and Client. - Exposed the
set_on_reset_callbackAPI in Python Timer. - Added an AbstractExecutor class to support typing of non wait-set executors.
- Added an
ExecutorBaseclass to share code like_take_subscriptionwith WaitSet executors. - Add api for
executor.create_future(), encouraging users to create futures bound to the executor. - Add a new
AsyncioClockenablingsleep_until_asyncandsleep_for_async - Raise CancelledError inside coroutine of cancelled task
- Enhance
rclpy.Futuretoyield self- Allows more efficient management of blocked tasks in both SingleThreadedExecutor and AsyncioExecutor using a new
executor._resume_taskmethod - causing an explicit crash when asyncio awaits an rclpy.Future rather than a silent busy loop
- Allows more efficient management of blocked tasks in both SingleThreadedExecutor and AsyncioExecutor using a new
Supported & Unsupported Entities
Supported
- Subscriptions (and publishers)
- Services and clients
- Timers
Not Supported
- Guard conditions
- Waitables
- The existing EventsExecutor "extracts" the inner entities of a waitable by adding it to a WaitSet
- I think we should skip waitables for this first PR, and add proper support in the future based on the neater
set_on_ready_callbackapproach of rclcpp
- Callback groups
Updates
- Running the
test_rclpy_performance.pyscript from the EventsExecutor PR on the asyncio executor yielded fantastic results!
I would love to see this make its way into main. My main frustration with rclpy is that it seems to ignore the "Pythonic" way of doing things in favor of its own way. Asyncio integration would make ROS2 much more pleasant to work with in Python.
Mentioning #1461
Update
I worked through a couple of design iterations, and I believe I settled on one that almost exactly matches the behavior of SingleThreadedExecutor. I had to make some minor changes in the codebase that also affect SingleThreadedExecutor, but I believe I did not brake any API or change existing behavior. Still, let me know if something might prevent us from back porting this PR to jazzy in the future.
@sloretz you came up in the last working group meeting as the most qualified maintainer to review this PR. Would you be willing to take a look?
Thank you for the PR!
@sloretz you came up in the last working group meeting as the most qualified maintainer to review this PR. Would you be willing to take a look?
I'm giving it a skim now, but it might take me a while to review it fully. I think we could take a couple changes right away. Would you be willing to create two PRs and ping me as a reviewer?
- A PR adding the
get_logger_nameAPI (I'd subjectively suggest the user-facing Python API be aloggerproperty that returns the class member_loggerso thatget_logger()is only called once when the entity is created) - A PR adding
AbstractExecutorandBaseExecutor
One of the reasons rclpy has its own executor instead of using concurrent.futures or asyncio is to support both coroutines and multithreading at the same time. I would assume any async methods on the asyncio executor would need to be called in the same thread as the loop. How about the non-aync methods on the executor? Are there any that can't be called from a different thread?
Guard condition and waitable support will be important (Actions are implemented using waitables). Are there any technical blockers to implementing them in the asyncio executor?
Callback groups support isn't necessary. Callback groups were created for C++, but the problem they solve in C++ is solved in Python by using coroutines. I don't think we need callback groups in rclpy at all.
One of the reasons
rclpyhas its own executor instead of usingconcurrent.futuresorasynciois to support both coroutines and multithreading at the same time.
Both coroutines and multi-threading are different methods to achieve concurrency. Unlike cpp, in Python there is no performance advantage to using multi threading. In fact, because of the GIL, it usually decreases performance, even more in our common case of deploying a large amount of short callbacks.
Without asyncio, the only way to utilize many libraries (serial, HTTP, DB drivers) was multi-threading, but now this is no longer the case.
Most libraries now provide async APIs, and asyncio itself gives us useful features like run_in_executor and to_thread to handle the few calls that still block.
I would assume any
asyncmethods on the asyncio executor would need to be called in the same thread as the loop. How about the non-aync methods on the executor? Are there any that can't be called from a different thread?
Asyncio is not thread safe so the AsyncioExecutor isn't either.
Because of that, every AsyncioExecutor method call must originate from the loop thread.
Calling executor methods from different threads is definitely possible but must be done using executor.loop.call_soon_threadsafe (which writes to a loopback wake-fd, otherwise the event loop might not wake up from the selector).
Guard condition
Regarding guard conditions, the classic “wake the wait-set from a different thread”
is already covered by asyncio.call_soon_threadsafe.
The only viable use case for guard conditions is if we ever need to interrupt await queue.get() inside spin_once().
Do you have any concrete use case for this?
waitable support
As discussed in one of the working group meetings,
EventsExecutor in rclcpp uses a new API for waitables - set_on_ready_callback.
The python EventsExecutor implementation did not follow, and used a trick to extract the internal entities of the waitable by adding it to a wait set object.
I do not like this trick and prefer to implement proper events api for waitables as in rclcpp.
Since the API must be implemented in each waitable object, it seems like a lot of work and I plan to do that later in a different PR.
- A PR adding the
get_logger_nameAPI (I'd subjectively suggest the user-facing Python API be aloggerproperty that returns the class member_loggerso thatget_logger()is only called once when the entity is created)- A PR adding
AbstractExecutorandBaseExecutor
Sounds good. I'll let you know when they're ready.
Would you be willing to create two PRs and ping me as a reviewer?
- A PR adding the
get_logger_nameAPI (I'd subjectively suggest the user-facing Python API be aloggerproperty that returns the class member_loggerso thatget_logger()is only called once when the entity is created)- A PR adding
AbstractExecutorandBaseExecutor
@sloretz #1470 #1471
@sloretz friendly ping