celestia-node icon indicating copy to clipboard operation
celestia-node copied to clipboard

Issue with Bootstrappers Lagging Behind During Badger Compaction

Open smuu opened this issue 1 year ago • 5 comments

Celestia Node version

v0.13.2

OS

docker/kubernetes

Install tools

No response

Others

No response

Steps to reproduce it

  1. Monitoring and alerting systems to detect when bootstrappers lag.
  2. Observed warnings/alerts indicating some bootstrappers were lagging behind more than two blocks for periods up to ~5 minutes.
  3. Checked for lag incidents and found one specific instance occurring in the morning.
  4. Reviewed logs around the time of the incident and identified badger compaction processes running concurrently with the lag.

Expected result

Bootstrappers should remain within a close range of the current block height, not lagging behind by more than one or two blocks, even during periods of high load or maintenance activities such as badger compaction.

Actual result

Multiple bootstrappers experienced significant lag, falling behind by more than two blocks for several minutes. This issue was observed numerous times per hour for different bootstrappers, but it seems that only one bootstrapper is affected at a time. The lag coincided with periods when the badger database was undergoing compaction processes.

Please take a look at the attached screenshot and logs.

Relevant log output

https://pastebin.com/kgCELKfC

Notes

Screenshot from 2024-04-08 11-43-06

smuu avatar Apr 08 '24 10:04 smuu