drand icon indicating copy to clipboard operation
drand copied to clipboard

Improve handling of `runAggregator` stopping

Open AnomalRoil opened this issue 1 year ago • 2 comments

Currently, runAggregator can stop here:

https://github.com/drand/drand/blob/93f30865436a91ca0c4e75e82aeec2ff08197878/internal/chain/beacon/chainstore.go#L175-L184

but then it's never restarted and next incoming partials will "fails" silently I think.

It is currently launched only once: https://github.com/drand/drand/blob/93f30865436a91ca0c4e75e82aeec2ff08197878/internal/chain/beacon/chainstore.go#L115-L117

AnomalRoil avatar May 17 '24 08:05 AnomalRoil

Hi, I'd like to work on improving the handling of runAggregator stopping. Could you please share any guidelines or suggestions for how you'd like the restart logic to work, or any potential edge cases I should be aware of before I begin?

yhoungdev avatar Aug 17 '25 19:08 yhoungdev

some points to consider:

  • restart the aggregator if it goes down
  • aggregator shouldn't be running for beacons that aren't initialised (e.g. don't have a dist key yet)
  • right now we have an aggregator goroutine per beacon, but it could be refactored to a single global checker that indexes by beaconID (though not necessary, just an idea)
  • some errors may not want to restart; e.g. if the beaconprocess is gracefully closing

Thanks for picking this up, looking forward to reviewing!

CluEleSsUK avatar Aug 18 '25 07:08 CluEleSsUK