Build profiles are too aggressive
Yes, this is a bit harsh. On the other hand we should have a way to catch panics. Right not as far as I understand, we catch panics only in cases we are testing for a `JoinError`:
https://docs.rs/tokio/1.23.0/tokio/task/struct.JoinError.html
Originally posted by @elmattic in https://github.com/ChainSafe/forest/pull/2336#discussion_r1060617959
If we remove panic = "abort" now, how do we know if a tokio task did something really wrong (like out of bound access)?
Preferably we should check the JoinError after awaiting each JoinHandle and abort cleanly in case of panics, but the refactor to get there is not trivial.
This is my guess as to what's happened, please correct me if I'm wrong
- We were hitting errors where spawned tokio tasks were panicking, presumably throwing up backtraces that no-one would see
- We decided that aborting the process (print a backtrace, but no unwinding) was a "loud failure", so at least we'd get a bug report
And it would be more idiomatic to bubble up panics in spawned tasks, which is the refactor you're suggesting?
@aatifsyed Kind of, one issue we had in the past was with some tasks that were deadlocking and also other that were not joined properly. Those sticking tasks would hinder proper release of ref count to some value (like db). An then when shutting down forest, rocksdb was segfaulting (internal C++ threads trying to access destroyed structures, something like this).
See https://github.com/ChainSafe/forest/issues/1587.
We were hitting errors where spawned tokio tasks were panicking, presumably throwing up backtraces that no-one would see
No that did not happen, but I choose defensive code and turn this "loud failure" mode, just in case.
And it would be more idiomatic to bubble up panics in spawned tasks, which is the refactor you're suggesting?
Yes, but that's probably a quite involving task.
@aatifsyed what do we do with this? Is it actionable?
We no longer use panic = "abort" since the FVM requires unwinding panics for correctness.