Guillaume Potier

Results 121 comments of Guillaume Potier

> I can do it today. Should I use the latest `main`? Yes, just run `forest-cli send` a couple of times in a loop and see if you can reproduce....

This is a great finding, unfortunately it's not the same error than the one in the CI where the message was successfully added to the pool and its CID was...

@LesnyRumcajs I had another try reproducing this issue. The issue is still there but not easy to reproduce. Using only one node, I managed to reproduce the issue two times...

Not 100% sure but they do have some code that can react to HEAD changes and republish previously mined messages. See https://github.com/filecoin-project/lotus/blob/3fd57ff7d90a3f4c51975b89dfc98135116d52de/chain/messagepool/messagepool.go#L1262

@LesnyRumcajs I will check our CI. But, AFAIK, the pesky bug is still present.

> @elmattic do you recall what's the status of this? Is sending FIL still failing every now and then? https://github.com/ChainSafe/forest/actions/runs/9204884823/job/25319573657 We can now update the status to "still broken".

Result of my initial investigation: Consider following log of a Forest node on calibnet. ``` 2023-05-04T11:32:30.067295Z INFO forest_chain::store::chain_store: New heaviest tipset! [bafy2bzacedtn7fcy2fwy2oibegccu5bq6try5gzbtjjgwe6wrg4ir7mrk5mxi] (EPOCH = 529119) 2023-05-04T11:32:30.067707Z INFO forest_chain::store::chain_store: New heaviest...

Lotus doesn't have this kind of "single tipset" failure. I think we should understand and fix this issue in the first place.

If we remove `panic = "abort"` now, how do we know if a tokio task did something really wrong (like out of bound access)?

Preferably we should check the `JoinError` after awaiting each `JoinHandle` and abort cleanly in case of panics, but the refactor to get there is not trivial.