rclpy icon indicating copy to clipboard operation
rclpy copied to clipboard

Tasks that raise an exception crash the entire executor, even when the exception is caught

Open haudren-woven opened this issue 2 years ago • 8 comments
trafficstars

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • Binary
  • Version or commit hash:
    • Foxy
  • DDS implementation:
    • Cyclone
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

import rclpy


async def throwing_task():
    raise FileNotFoundError()


async def test_1():
    try:
        await throwing_task()
    except FileNotFoundError:
        pass
    return True


async def test_2(executor):
    task = executor.create_task(throwing_task)
    try:
        await task
    except FileNotFoundError:
        print("Error but should keep going")
    return True


rclpy.init()

node = rclpy.create_node("test_wait")

executor = rclpy.get_global_executor()
executor.add_node(node)

task = executor.create_task(test_1)

executor.spin_until_future_complete(task)

assert task.result()

task = executor.create_task(test_2, executor)

executor.spin_until_future_complete(task)

assert task.result()

rclpy.shutdown()

Expected behavior

The above code should complete and pass since the FileNotFoundError is caught in both cases

Actual behavior

The executor itself raises an exception on test_2 which is uncaught and brings down the entire test...

Additional information

I was trying to run a number of async callbacks in the rclpy executor, and I was raising exceptions in some cases. For some reason, those would bring down the entire node. I tracked it down to these lines: https://github.com/ros2/rclpy/blob/f6d652fa2d5d3a8164cc69fb543ae0693e958262/rclpy/rclpy/executors.py#L724

I am not sure why the executor is raising the exception of the task :thinking: It might be due to the fact that subscribers, timers etc... are tasks that are never explicitly awaited? I don't have much knowledge about the internals of rclpy but this is very surprising coming from asyncio... It is also very surprising to see that the exception is only improperly raised when packaged as an asynchronous task, and not when awaited directly.

Thank you!

haudren-woven avatar Mar 28 '23 06:03 haudren-woven

I think task = executor.create_task(throwing_task) is called with another handler in the executor, which does not have try&catch statement at all.

fujitatomoya avatar Mar 29 '23 20:03 fujitatomoya

The stacktrace is fairly clear:

Traceback (most recent call last):
  File "/tmp/test_rclpy.py", line 40, in <module>
    executor.spin_until_future_complete(task)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 299, in spin_until_future_complete
    self.spin_once_until_future_complete(future, timeout_sec)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 722, in spin_once_until_future_complete
    self.spin_once(timeout_sec)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 719, in spin_once
    raise handler.exception()
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/task.py", line 239, in __call__
    self._handler.send(None)
  File "/tmp/test_rclpy.py", line 5, in throwing_task
    raise FileNotFoundError()
FileNotFoundError

The exception is first raised in the throwing task (as expected!), then re-raised inside the executor. I'm wondering why we do this? I have the feeling that exceptions raised by asynchronous tasks should only be raised in an await call :thinking:

haudren-woven avatar Mar 30 '23 00:03 haudren-woven

There is some possibility that this would be improved by https://github.com/ros2/rclpy/pull/971 (though it needs a rebase). It might be worthwhile to test something like that.

clalancette avatar Apr 06 '23 17:04 clalancette

I encountered the same problem and created a workaround:

def schedule_task(coro: Coroutine[None, None, Any], executor: Executor) -> Future:
    """Create a task which can be canceled and does not crash the executor."""
    future = Future(executor=executor)
    # interrupt any pending spins
    future.add_done_callback(lambda _: executor.wake())
    executor.create_task(condom_task(future, coro))
    return future


@types.coroutine
def coroutine_yield() -> Generator[None, None, None]:
    """Yield control to the executor."""
    yield


async def condom_task(future: Future, coro: Coroutine[None, None, Any]) -> None:
    """Redirect exceptions to a future to prevent the executor from seeing them."""
    while not future.cancelled():
        try:
            coro.send(None)
        except StopIteration as result:
            future.set_result(result.value)
            return
        except Exception as error:
            future.set_exception(error)
            return
        await coroutine_yield()
    # run any cleanup code
    try:
        coro.close()
    except Exception as error:
        if isinstance(error.__context__, GeneratorExit):
            future.set_exception(error)
        else:
            raise

The workaround involves wrapping the Coroutine with a second one which acts as shield to redirect exceptions from the coroutine away from the executor to a future.

Deric-W avatar Nov 29 '23 09:11 Deric-W

Maybe we could copy the behavior of asyncio by adding an api to set an exception handler on the executor, where the default is to log and move on?

sloretz avatar Feb 16 '24 19:02 sloretz