forest Build profiles are too aggressive

          Yes, this is a bit harsh. On the other hand we should have a way to catch panics. Right not as far as I understand, we catch panics only in cases we are testing for a `JoinError`:

https://docs.rs/tokio/1.23.0/tokio/task/struct.JoinError.html

Originally posted by @elmattic in https://github.com/ChainSafe/forest/pull/2336#discussion_r1060617959

May 16 '23 01:05 aatifsyed

If we remove panic = "abort" now, how do we know if a tokio task did something really wrong (like out of bound access)?

May 16 '23 07:05 elmattic

Preferably we should check the JoinError after awaiting each JoinHandle and abort cleanly in case of panics, but the refactor to get there is not trivial.

May 16 '23 10:05 elmattic

This is my guess as to what's happened, please correct me if I'm wrong

We were hitting errors where spawned tokio tasks were panicking, presumably throwing up backtraces that no-one would see
We decided that aborting the process (print a backtrace, but no unwinding) was a "loud failure", so at least we'd get a bug report

And it would be more idiomatic to bubble up panics in spawned tasks, which is the refactor you're suggesting?

May 16 '23 10:05 aatifsyed

@aatifsyed Kind of, one issue we had in the past was with some tasks that were deadlocking and also other that were not joined properly. Those sticking tasks would hinder proper release of ref count to some value (like db). An then when shutting down forest, rocksdb was segfaulting (internal C++ threads trying to access destroyed structures, something like this).

See https://github.com/ChainSafe/forest/issues/1587.

May 16 '23 11:05 elmattic

We were hitting errors where spawned tokio tasks were panicking, presumably throwing up backtraces that no-one would see

No that did not happen, but I choose defensive code and turn this "loud failure" mode, just in case.

May 16 '23 11:05 elmattic

And it would be more idiomatic to bubble up panics in spawned tasks, which is the refactor you're suggesting?

Yes, but that's probably a quite involving task.

May 16 '23 11:05 elmattic

@aatifsyed what do we do with this? Is it actionable?

Apr 17 '24 09:04 LesnyRumcajs

We no longer use panic = "abort" since the FVM requires unwinding panics for correctness.

Aug 02 '24 11:08 lemmih