flow-go
flow-go copied to clipboard
Disk usage spikes on Consensus, Collection, and Verification nodes
🐞 Bug Report
Large spikes in Consensus, Collection, and Verification disk usage have been observed. This could be because of badger compaction.
Testnet has consistently experienced issues and it has lead to crashes & database issues that must be resolved by truncating the db. The spike in usage causes the node to temporarily run out of storage and results in the failure. Considering the size of the spikes, it feels like the spike is disproportionate to what would be compacted. These spikes seem to have always existed, but the size of the spikes has drastically increased.
We either need to dig in deeper to determine whether we can reduce the spikes in usage, or we need to ensure that node operators (including FlowFoundation) are aggressively increasing their disk sizes.
What is the severity of this bug?
Critical: We can't do anything if this isn't actioned immediately (product doesn't function without this, it's blocking us or users, or it resolves a high severity security issue). One person should look at this right now.
Screenshots
Grafana dashboard for FlowFoundation nodes