forest icon indicating copy to clipboard operation
forest copied to clipboard

Build profiles are too aggressive

Open aatifsyed opened this issue 2 years ago • 7 comments

          Yes, this is a bit harsh. On the other hand we should have a way to catch panics. Right not as far as I understand, we catch panics only in cases we are testing for a `JoinError`:

https://docs.rs/tokio/1.23.0/tokio/task/struct.JoinError.html

Originally posted by @elmattic in https://github.com/ChainSafe/forest/pull/2336#discussion_r1060617959

aatifsyed avatar May 16 '23 01:05 aatifsyed

If we remove panic = "abort" now, how do we know if a tokio task did something really wrong (like out of bound access)?

elmattic avatar May 16 '23 07:05 elmattic

Preferably we should check the JoinError after awaiting each JoinHandle and abort cleanly in case of panics, but the refactor to get there is not trivial.

elmattic avatar May 16 '23 10:05 elmattic

This is my guess as to what's happened, please correct me if I'm wrong

  • We were hitting errors where spawned tokio tasks were panicking, presumably throwing up backtraces that no-one would see
  • We decided that aborting the process (print a backtrace, but no unwinding) was a "loud failure", so at least we'd get a bug report

And it would be more idiomatic to bubble up panics in spawned tasks, which is the refactor you're suggesting?

aatifsyed avatar May 16 '23 10:05 aatifsyed

@aatifsyed Kind of, one issue we had in the past was with some tasks that were deadlocking and also other that were not joined properly. Those sticking tasks would hinder proper release of ref count to some value (like db). An then when shutting down forest, rocksdb was segfaulting (internal C++ threads trying to access destroyed structures, something like this).

See https://github.com/ChainSafe/forest/issues/1587.

elmattic avatar May 16 '23 11:05 elmattic

We were hitting errors where spawned tokio tasks were panicking, presumably throwing up backtraces that no-one would see

No that did not happen, but I choose defensive code and turn this "loud failure" mode, just in case.

elmattic avatar May 16 '23 11:05 elmattic

And it would be more idiomatic to bubble up panics in spawned tasks, which is the refactor you're suggesting?

Yes, but that's probably a quite involving task.

elmattic avatar May 16 '23 11:05 elmattic

@aatifsyed what do we do with this? Is it actionable?

LesnyRumcajs avatar Apr 17 '24 09:04 LesnyRumcajs

We no longer use panic = "abort" since the FVM requires unwinding panics for correctness.

lemmih avatar Aug 02 '24 11:08 lemmih