Optimize ChannelMonitor persistence on block connections.
Currently, every block connection triggers the persistence of all ChannelMonitors with an updated best_block. This approach poses challenges for large node operators managing thousands of channels. Furthermore, it leads to a thundering herd problem (https://en.wikipedia.org/wiki/Thundering_herd_problem), overwhelming the storage with simultaneous requests.
To address this issue, we now persist ChannelMonitors at a regular cadence, spreading their persistence across blocks to mitigate spikes in write operations.
Tasks:
- [ ] Don't pause events for chainsync persistence #2957 and base it on that.
- [ ] Concept/Approach Ack
- [ ] Decide a good default for partition_factor
- [ ] Maybe we can make partition_factor user-configurable. (Can also do this later, depends on our default value)
- [ ] Write more tests for persistence with partition_factor.
Closes #2647
Is this something we might want to consider not doing on mobile? Thinking that we won't be able to RBF onchain claims properly if the fee estimator is broken and we're not persisting the most recent feerate we tried within the OnchainTxHandler.
I guess we should/could consider always persisting if there's pending claims (eg channel has been closed but has balances to claim)? Alternatively, we could always persist if we only have < 5 channels.
What's the status here @G8XSU?
Yes makes sense, we can always persist if there are pending claims.
I am looking for a concept/approach ack here before I proceed with rest of the changes. Are we in the right direction about how to distribute?
I think wpaulino raised a good point and we should do something to ensure we regularly persist monitors on mobile (like what I suggested above), but otherwise concept ACK.
Marking this PR ready for review.
Squashed Fixup commit.