tower-abci icon indicating copy to clipboard operation
tower-abci copied to clipboard

tower-abci: handle errors more gracefully

Open erwanor opened this issue 2 years ago • 3 comments

For each individual connection, we spawn a tokio task that is responsible for driving the state and handling I/O. In this context, a variety of failures can occur, ranging from codec errors to connection failures etc. Right now, if such a failure occurs, we simply crash the task without propagating the error in any way, or contextualizing the failure in a log (beside a rust backtrace).

erwanor avatar Jun 14 '23 18:06 erwanor

Related to https://github.com/penumbra-zone/penumbra/issues/689

erwanor avatar Aug 01 '23 14:08 erwanor

@erwanor I'm interested in tackling this issue. First I want to clarify that there is no way to propagate non-application errors back to Comet, in fact it is expected that for any such error the ABCI app exits and both processes are meant to restart to initiate the Crash Recovery. Unless there has been a change in Comet within the last couple of months that is the expected behaviour.

xla avatar Oct 27 '23 04:10 xla

@xla great point! i have amended the issue. do you have something specific in mind to address it? if you have the appetite for it, we could track the connection handles and propagate the error to the application when a worker fails. otherwise, logging would already be a good first step.

erwanor avatar Oct 27 '23 15:10 erwanor