core-rs-albatross
core-rs-albatross copied to clipboard
Macro block production can stop when validators fail to send or receive messages.
Handel does need network activity to generate network activity. There is an initial LevelUpdate that is send, which is supposed to trigger the remaining peers to create new aggregates and thus new messages.
In rare circumstances however these messages can fail to be received or send. If that happens for all peers that can lead to aggregations stopping as no more network messages are generated if none are received.
This effect can observed the easiest with just 2 validators, but it theoretically extends beyond that with a smaller possibility. It also ist not the desired design in general and thus needs to be changed.
This behavior can be observed in several of the CI executions, when the 4 validators scenario fails because blocks stopped being produced after some timeout. Some potential examples of this issue can be observed in: https://github.com/nimiq/core-rs-albatross/runs/4212024885?check_suite_focus=true https://github.com/nimiq/core-rs-albatross/runs/4210729994?check_suite_focus=true https://github.com/nimiq/core-rs-albatross/runs/4204373089?check_suite_focus=true