batou icon indicating copy to clipboard operation
batou copied to clipboard

Deployment errors do not immediately stop execution

Open zagy opened this issue 1 year ago • 0 comments

When a component errors during verify/update we occasionally see further components being deployed despite the error. This is even true for jobs=1.

I think what happens is the following:

  • All components which have no unsatisfied dependencies are scheduled.
  • Execution stops only when all scheduled components are done.
  • In case of error, no new components are scheduled.

There is the following code:

https://github.com/flyingcircusio/batou/blob/f87a00acc3ce256014d49bef2eac581e91009275/src/batou/deploy.py#L315-L318

From the asyncio.gather docs:

If return_exceptions is False (default), the first raised exception is immediately propagated to the task that awaits on gather(). Other awaitables in the aws sequence won’t be cancelled and will continue to run.

I suppose this could be updated to the more modern TaskGroup:

The first time any of the tasks belonging to the group fails with an exception other than asyncio.CancelledError, the remaining tasks in the group are cancelled.

zagy avatar Aug 31 '23 07:08 zagy