flow-go icon indicating copy to clipboard operation
flow-go copied to clipboard

Replacing Badger with Pebble DB

Open j1010001 opened this issue 1 year ago • 9 comments

Why

  1. Badger DB is a project that is not maintained anymore. At some point it will not be compatible with a new version of Go and will block the upgrade of flow-go to newer version of Go language.
  2. Badger has caused memory spikes and performance issues in the past (so far mitigated with more aggressive GC GoMemLimit setting) and as the load on the DB grows these issues are likely to reoccur in the future.
  3. Badger DB does not support data pruning, so on long-running network removing unused data requires node downtime.

We have identified following project milestones that complete the work of removing flow-go dependency on Badger DB.

✅ Milestone 1 - Refactor data access & prune chunk data packs

  • [x] https://github.com/onflow/flow-go/issues/6516

Milestone 2 - DB access refactoring for low-risk data on EN, VN and AN

  • [x] https://github.com/onflow/flow-go/issues/6527

Milestone 3 - unblock pruning of Execution, Access and Verification data

  • [ ] https://github.com/onflow/flow-go/issues/7242

Milestone 4 - DB access refactoring - remove dependency on Badger DB completely form ENs and ANs

  • [ ] https://github.com/onflow/flow-go/issues/7265

Milestone 5 - DB access refactoring - remove dependency on Badger DB completely form Collection and Consensus nodes

OKR placeholder: #6528

Consensus

Collection

  • [ ] Collection Consensus

  • [ ] Deployment - Switching from Badger to Pebble (Dynamic bootstrap (preferred), spork, or Migration)

  • Tools TBD

Milestone 6 - pruning of Execution, access and verification data

Task breakdown TDB

  • [x] https://github.com/onflow/flow-go/issues/7126

j1010001 avatar Oct 01 '24 23:10 j1010001

Badger DB is a project that is not maintained anymore.

@j1010001 did you mean that BadgerDB v2 and v3 are not maintained anymore (instead of entire BadgerDB project)?

BadgerDB released v4.0 in Feb 2023 and v4.3 in Aug 2024.

  • v4.3.0 (Aug 28, 2024)
  • v3.2103.5 (Dec 15, 2022) is the last v3.
  • v2.2007.4 (Aug 25, 2021) is the last v2 and version currently used by flow-go (go.mod).

More details and other releases at BadgerDB releases.

fxamacker avatar Oct 02 '24 01:10 fxamacker

Issues for this epic:

  • https://github.com/onflow/flow-go/issues/6516
  • https://github.com/onflow/flow-go/issues/6518
  • https://github.com/onflow/flow-go/issues/6519
  • https://github.com/onflow/flow-go/issues/6520
  • https://github.com/onflow/flow-go/issues/6521
  • https://github.com/onflow/flow-go/issues/6522
  • https://github.com/onflow/flow-go/issues/6523

zhangchiqing avatar Oct 02 '24 20:10 zhangchiqing

did you have any preliminary results with running nodes with pebble instead of badger? I am using pebble and pretty happy so far, but was thinking to check badger for faster writes, though I am scared on memory usage etc a bit.

bluesign avatar Oct 09 '24 17:10 bluesign

Unfortunately , we didn’t gather metrics for the proof-of-concept benchmark, as our focus was on ensuring execution correctness. Once this issue is completed, I will collect metrics for comparison.

zhangchiqing avatar Oct 10 '24 22:10 zhangchiqing

Trying to list all the data and their location that need to be refactored.

  • Execution

    • Execution Result https://github.com/onflow/flow-go/pull/6906
    • Collections https://github.com/onflow/flow-go/pull/7059
    • StopControl (VersionBeacon) https://github.com/onflow/flow-go/pull/7085
    • Execution Data (Bitswap)
    • Result GCP Uploader https://github.com/onflow/flow-go/pull/7084
    • Switching from Badger to Pebble
      • Migrate the last executed result to pebble in order to make next block executable https://github.com/onflow/flow-go/pull/7117
    • Follower
  • Verification

    • Approvals https://github.com/onflow/flow-go/pull/6868
    • ChunkQueue https://github.com/onflow/flow-go/pull/6947
    • Switching from Badger to Pebble https://github.com/onflow/flow-go/pull/6948
    • Follower
  • Access

  • Follower

  • Consensus

  • Collection

  • Tools

    • Utility
      • add flags to read from pebble https://github.com/onflow/flow-go/pull/7092
      • Test if pebble can read data without stopping the process by creating a checkpoint of db on the fly. https://github.com/onflow/flow-go/pull/7092
    • AdminTool
      • If Utility can read pebble without stopping the process, we can get rid of the admin tool that reads data from pebble. https://github.com/onflow/flow-go/pull/7092

zhangchiqing avatar Feb 19 '25 20:02 zhangchiqing

hi @j1010001, @zhangchiqing - is my understanding correct w.r.t outcomes of the milestones above:

  1. Milestone 1 - chunkdata pack pruning on EN ✅

  2. Milestone 2 - low risk data on EN, AN, VN moved to Pebble DB (in progress)

  3. Milestone 3 -

    • Follower engine migrated to PebbleDB.
    • Verification node can run exclusively on PebbleDB.
  4. Milestone 4 - EN and AN can run exclusively on PebbleDB

  5. Milestone 5- SN and LN can run exclusively on PebbleDB

  6. Milestone 6 - Data can be automatically pruned on EN, AN, VN.

  7. Milestone 7 - Upgrade to Pebble 2.x

vishalchangrani avatar Mar 18 '25 21:03 vishalchangrani

Scope update:

There are portions of M5 Consensus that are moving to M3:

Consensus Builder, Finalizer, Persister

ExecForkSuppressor

Deployment - Switching from Badger to Pebble (Dynamic bootstrap (preferred), spork, or Migration)

Protocol Data (Consensus Follower)

j1010001 avatar Apr 28 '25 17:04 j1010001

Removed M7 - upgrade to Pebble 2.x (moved as task to M4)

j1010001 avatar Jun 10 '25 22:06 j1010001

  • One node of each type is now running on Pebble db on mainnet (Consensus node still has some DKG data stored in badgerdb, see issue).
  • Alex is back and he is working through Leo’s PR. There are 4 of these which are on the critical path.
  • The next big milestone for this OKR is merging Malleability into PebbleDB, resolving conflicts and then testing the combine branch. The plan is to merge malleability to master by this week, then work through merging malleability to pebble DB next two weeks. Rough ETA is end of august.

vishalchangrani avatar Aug 12 '25 17:08 vishalchangrani