tower-abci
tower-abci copied to clipboard
tower-abci: handle errors more gracefully
For each individual connection, we spawn a tokio task that is responsible for driving the state and handling I/O. In this context, a variety of failures can occur, ranging from codec errors to connection failures etc. Right now, if such a failure occurs, we simply crash the task without propagating the error in any way, or contextualizing the failure in a log (beside a rust backtrace).
Related to https://github.com/penumbra-zone/penumbra/issues/689
@erwanor I'm interested in tackling this issue. First I want to clarify that there is no way to propagate non-application errors back to Comet, in fact it is expected that for any such error the ABCI app exits and both processes are meant to restart to initiate the Crash Recovery. Unless there has been a change in Comet within the last couple of months that is the expected behaviour.
@xla great point! i have amended the issue. do you have something specific in mind to address it? if you have the appetite for it, we could track the connection handles and propagate the error to the application when a worker fails. otherwise, logging would already be a good first step.