drand Improve handling of `runAggregator` stopping

Currently, runAggregator can stop here:

https://github.com/drand/drand/blob/93f30865436a91ca0c4e75e82aeec2ff08197878/internal/chain/beacon/chainstore.go#L175-L184

but then it's never restarted and next incoming partials will "fails" silently I think.

It is currently launched only once: https://github.com/drand/drand/blob/93f30865436a91ca0c4e75e82aeec2ff08197878/internal/chain/beacon/chainstore.go#L115-L117

May 17 '24 08:05 AnomalRoil

Hi, I'd like to work on improving the handling of runAggregator stopping. Could you please share any guidelines or suggestions for how you'd like the restart logic to work, or any potential edge cases I should be aware of before I begin?

Aug 17 '25 19:08 yhoungdev

some points to consider:

restart the aggregator if it goes down
aggregator shouldn't be running for beacons that aren't initialised (e.g. don't have a dist key yet)
right now we have an aggregator goroutine per beacon, but it could be refactored to a single global checker that indexes by beaconID (though not necessary, just an idea)
some errors may not want to restart; e.g. if the beaconprocess is gracefully closing

Thanks for picking this up, looking forward to reviewing!

Aug 18 '25 07:08 CluEleSsUK