Replacing Badger with Pebble DB
Why
- Badger DB is a project that is not maintained anymore. At some point it will not be compatible with a new version of Go and will block the upgrade of flow-go to newer version of Go language.
- Badger has caused memory spikes and performance issues in the past (so far mitigated with more aggressive GC GoMemLimit setting) and as the load on the DB grows these issues are likely to reoccur in the future.
- Badger DB does not support data pruning, so on long-running network removing unused data requires node downtime.
We have identified following project milestones that complete the work of removing flow-go dependency on Badger DB.
✅ Milestone 1 - Refactor data access & prune chunk data packs
- [x] https://github.com/onflow/flow-go/issues/6516
Milestone 2 - DB access refactoring for low-risk data on EN, VN and AN
- [x] https://github.com/onflow/flow-go/issues/6527
Milestone 3 - unblock pruning of Execution, Access and Verification data
- [ ] https://github.com/onflow/flow-go/issues/7242
Milestone 4 - DB access refactoring - remove dependency on Badger DB completely form ENs and ANs
- [ ] https://github.com/onflow/flow-go/issues/7265
Milestone 5 - DB access refactoring - remove dependency on Badger DB completely form Collection and Consensus nodes
OKR placeholder: #6528
Consensus
- [x] Protocol Data (Consensus Builder)
- [x] Deployment - Switching from Badger to Pebble (Dynamic bootstrap (preferred), spork, or Migration)
- [x] Protocol Data (Consensus Follower)
Collection
-
[ ] Collection Consensus
- [ ] Cluster State
- [ ] [Guarantee creation](https://github.com/onflow/flow-go/blob/master/storage/badger/procedure/cluster.go#L162
- [ ] Collections Provider
-
[ ] Deployment - Switching from Badger to Pebble (Dynamic bootstrap (preferred), spork, or Migration)
-
Tools TBD
Milestone 6 - pruning of Execution, access and verification data
Task breakdown TDB
- [x] https://github.com/onflow/flow-go/issues/7126
Badger DB is a project that is not maintained anymore.
@j1010001 did you mean that BadgerDB v2 and v3 are not maintained anymore (instead of entire BadgerDB project)?
BadgerDB released v4.0 in Feb 2023 and v4.3 in Aug 2024.
- v4.3.0 (Aug 28, 2024)
- v3.2103.5 (Dec 15, 2022) is the last v3.
- v2.2007.4 (Aug 25, 2021) is the last v2 and version currently used by flow-go (go.mod).
More details and other releases at BadgerDB releases.
Issues for this epic:
- https://github.com/onflow/flow-go/issues/6516
- https://github.com/onflow/flow-go/issues/6518
- https://github.com/onflow/flow-go/issues/6519
- https://github.com/onflow/flow-go/issues/6520
- https://github.com/onflow/flow-go/issues/6521
- https://github.com/onflow/flow-go/issues/6522
- https://github.com/onflow/flow-go/issues/6523
Issues for this epic:
- Chunk Data pack Pruner #6516
- Replace Badger Transaction with Batch updates - Access Node-related data #6518
- Replace Badger Transaction with Batch updates - Verification Node-related data #6519
- Replace Badger Transaction with Batch updates - Execution Node-related data #6520
- Replace Badger Transaction with Batch updates - Collection Node-related data #6521
- Replace Badger Transaction with Batch updates - Protocol-related data #6522
- Database Operation Transition: From Badger to Pebble #6523
All the issues above have been linked to the relevant phases
did you have any preliminary results with running nodes with pebble instead of badger? I am using pebble and pretty happy so far, but was thinking to check badger for faster writes, though I am scared on memory usage etc a bit.
Unfortunately , we didn’t gather metrics for the proof-of-concept benchmark, as our focus was on ensuring execution correctness. Once this issue is completed, I will collect metrics for comparison.
Trying to list all the data and their location that need to be refactored.
-
Execution
- Execution Result https://github.com/onflow/flow-go/pull/6906
- Collections https://github.com/onflow/flow-go/pull/7059
- StopControl (VersionBeacon) https://github.com/onflow/flow-go/pull/7085
- Execution Data (Bitswap)
- Result GCP Uploader https://github.com/onflow/flow-go/pull/7084
- Switching from Badger to Pebble
- Migrate the last executed result to pebble in order to make next block executable https://github.com/onflow/flow-go/pull/7117
- Follower
-
Verification
- Approvals https://github.com/onflow/flow-go/pull/6868
- ChunkQueue https://github.com/onflow/flow-go/pull/6947
- Switching from Badger to Pebble https://github.com/onflow/flow-go/pull/6948
- Follower
-
Access
- Execution Data (Bitswap)
- Execution Result https://github.com/onflow/flow-go/pull/6906
- Collections https://github.com/onflow/flow-go/pull/7093
- StopControl (VersionBeacon) https://github.com/onflow/flow-go/pull/7085
- Tx/Block Status RPC API
- Switching from Badger to Pebble
- Follower
-
Follower
- Protocol Data (Consensus Follower)
- Libp2p Blocklist
-
Consensus
-
Collection
- Collection Consensus
- Collections Provider
- Switching from Badger to Pebble
- Follower
-
Tools
- Utility
- add flags to read from pebble https://github.com/onflow/flow-go/pull/7092
- Test if pebble can read data without stopping the process by creating a checkpoint of db on the fly. https://github.com/onflow/flow-go/pull/7092
- AdminTool
- If Utility can read pebble without stopping the process, we can get rid of the admin tool that reads data from pebble. https://github.com/onflow/flow-go/pull/7092
- Utility
hi @j1010001, @zhangchiqing - is my understanding correct w.r.t outcomes of the milestones above:
-
Milestone 1 - chunkdata pack pruning on EN ✅
-
Milestone 2 - low risk data on EN, AN, VN moved to Pebble DB (in progress)
-
Milestone 3 -
- Follower engine migrated to PebbleDB.
- Verification node can run exclusively on PebbleDB.
-
Milestone 4 - EN and AN can run exclusively on PebbleDB
-
Milestone 5- SN and LN can run exclusively on PebbleDB
-
Milestone 6 - Data can be automatically pruned on EN, AN, VN.
-
Milestone 7 - Upgrade to Pebble 2.x
Scope update:
There are portions of M5 Consensus that are moving to M3:
Consensus Builder, Finalizer, Persister
Deployment - Switching from Badger to Pebble (Dynamic bootstrap (preferred), spork, or Migration)
Protocol Data (Consensus Follower)
Removed M7 - upgrade to Pebble 2.x (moved as task to M4)
- One node of each type is now running on Pebble db on mainnet (Consensus node still has some DKG data stored in badgerdb, see issue).
- Alex is back and he is working through Leo’s PR. There are 4 of these which are on the critical path.
- The next big milestone for this OKR is merging Malleability into PebbleDB, resolving conflicts and then testing the combine branch. The plan is to merge malleability to master by this week, then work through merging malleability to pebble DB next two weeks. Rough ETA is end of august.