Improve handling of `runAggregator` stopping
Currently, runAggregator can stop here:
https://github.com/drand/drand/blob/93f30865436a91ca0c4e75e82aeec2ff08197878/internal/chain/beacon/chainstore.go#L175-L184
but then it's never restarted and next incoming partials will "fails" silently I think.
It is currently launched only once: https://github.com/drand/drand/blob/93f30865436a91ca0c4e75e82aeec2ff08197878/internal/chain/beacon/chainstore.go#L115-L117
Hi, I'd like to work on improving the handling of runAggregator stopping. Could you please share any guidelines or suggestions for how you'd like the restart logic to work, or any potential edge cases I should be aware of before I begin?
some points to consider:
- restart the aggregator if it goes down
- aggregator shouldn't be running for beacons that aren't initialised (e.g. don't have a dist key yet)
- right now we have an aggregator goroutine per beacon, but it could be refactored to a single global checker that indexes by beaconID (though not necessary, just an idea)
- some errors may not want to restart; e.g. if the beaconprocess is gracefully closing
Thanks for picking this up, looking forward to reviewing!