rclpy
rclpy copied to clipboard
Tasks that raise an exception crash the entire executor, even when the exception is caught
Bug report
Required Info:
- Operating System:
- Ubuntu 20.04
- Installation type:
- Binary
- Version or commit hash:
- Foxy
- DDS implementation:
- Cyclone
- Client library (if applicable):
- rclpy
Steps to reproduce issue
import rclpy
async def throwing_task():
raise FileNotFoundError()
async def test_1():
try:
await throwing_task()
except FileNotFoundError:
pass
return True
async def test_2(executor):
task = executor.create_task(throwing_task)
try:
await task
except FileNotFoundError:
print("Error but should keep going")
return True
rclpy.init()
node = rclpy.create_node("test_wait")
executor = rclpy.get_global_executor()
executor.add_node(node)
task = executor.create_task(test_1)
executor.spin_until_future_complete(task)
assert task.result()
task = executor.create_task(test_2, executor)
executor.spin_until_future_complete(task)
assert task.result()
rclpy.shutdown()
Expected behavior
The above code should complete and pass since the FileNotFoundError is caught in both cases
Actual behavior
The executor itself raises an exception on test_2 which is uncaught and brings down the entire test...
Additional information
I was trying to run a number of async callbacks in the rclpy executor, and I was raising exceptions in some cases. For some reason, those would bring down the entire node. I tracked it down to these lines:
https://github.com/ros2/rclpy/blob/f6d652fa2d5d3a8164cc69fb543ae0693e958262/rclpy/rclpy/executors.py#L724
I am not sure why the executor is raising the exception of the task :thinking: It might be due to the fact that subscribers, timers etc... are tasks that are never explicitly awaited? I don't have much knowledge about the internals of rclpy but this is very surprising coming from asyncio... It is also very surprising to see that the exception is only improperly raised when packaged as an asynchronous task, and not when awaited directly.
Thank you!
I think task = executor.create_task(throwing_task) is called with another handler in the executor, which does not have try&catch statement at all.
The stacktrace is fairly clear:
Traceback (most recent call last):
File "/tmp/test_rclpy.py", line 40, in <module>
executor.spin_until_future_complete(task)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 299, in spin_until_future_complete
self.spin_once_until_future_complete(future, timeout_sec)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 722, in spin_once_until_future_complete
self.spin_once(timeout_sec)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 719, in spin_once
raise handler.exception()
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/task.py", line 239, in __call__
self._handler.send(None)
File "/tmp/test_rclpy.py", line 5, in throwing_task
raise FileNotFoundError()
FileNotFoundError
The exception is first raised in the throwing task (as expected!), then re-raised inside the executor. I'm wondering why we do this? I have the feeling that exceptions raised by asynchronous tasks should only be raised in an await call :thinking:
There is some possibility that this would be improved by https://github.com/ros2/rclpy/pull/971 (though it needs a rebase). It might be worthwhile to test something like that.
I encountered the same problem and created a workaround:
def schedule_task(coro: Coroutine[None, None, Any], executor: Executor) -> Future:
"""Create a task which can be canceled and does not crash the executor."""
future = Future(executor=executor)
# interrupt any pending spins
future.add_done_callback(lambda _: executor.wake())
executor.create_task(condom_task(future, coro))
return future
@types.coroutine
def coroutine_yield() -> Generator[None, None, None]:
"""Yield control to the executor."""
yield
async def condom_task(future: Future, coro: Coroutine[None, None, Any]) -> None:
"""Redirect exceptions to a future to prevent the executor from seeing them."""
while not future.cancelled():
try:
coro.send(None)
except StopIteration as result:
future.set_result(result.value)
return
except Exception as error:
future.set_exception(error)
return
await coroutine_yield()
# run any cleanup code
try:
coro.close()
except Exception as error:
if isinstance(error.__context__, GeneratorExit):
future.set_exception(error)
else:
raise
The workaround involves wrapping the Coroutine with a second one which acts as shield to redirect exceptions from the coroutine away from the executor to a future.
Maybe we could copy the behavior of asyncio by adding an api to set an exception handler on the executor, where the default is to log and move on?