go-libp2p
go-libp2p copied to clipboard
Catching panics
(moving a discussion from a private conversation to somewhere more public)
Libp2p performs quite a bit of complex parsing, which has occasionally lead to panics at runtime. When uncaught, these panics crash the entire node.
Proposal: Catch panics at "failure boundaries". E.g.:
- If we have some form of "connection" worker, catch panics in the worker and kill the entire connection if the worker panics. Same for streams.
- Catch panics in per-peer stream handlers, cleaning up all state related to the peer.
- Catch panics in low-level parsing logic. Parsing tends to be pretty self-contained but also pretty error prone.
- Stretch: Where possible, catch service-level panics, cancel all current requests, close all resources, and restart. But we do need to be a bit careful to not continue running in a corrupted state.
- https://github.com/libp2p/go-libp2p/pull/1376
- https://github.com/libp2p/go-libp2p-transport-upgrader/pull/107