materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Support FoundationDB for consensus

Open antiguru opened this issue 2 months ago • 6 comments

Implements a consensus backend to talk to FoundationDB. Very much untested and still panics sometimes.

Run: bin/environmentd --optimized --consensus foundationdb:

TODO to make this work everywhere:

  • Make building with FoundationDB optional. Currently, there is no easy way to get a libfdb_c on aarch64 Mac.
  • This change adds the fdb library to the base images, which isn't great if we don't want to use it.
  • The test infrastructure has different ways to distinguish metadata stores, sometimes a string, or a bool, to switch between the built-in postgres or external crdb. It'd be nice to change this to an enum of Internal, External-CRDB, External-FDB to allow easy switching between different implementations and targets.
    • Testdrive passes most tests, but the consistency and tombstone checks don't work as it assumes an incorrect store.
  • Initializing FDB requires a fdbcli call to create the database. Before that, we cannot establish a connection. If we just use the provided image, we need to slot that call somewhere outside of docker-compose itself. See https://github.com/apple/foundationdb/blob/main/packaging/docker/samples/local/start.bash for context.

antiguru avatar Oct 10 '25 15:10 antiguru

Still losing data somewhere, it's getting closer though:

2025-10-10T15:18:10.342212Z  thread 'persist:001f' panicked at /home/moritz/dev/repos/materialize/src/persist-client/src/internal/state_versions.rs:771:9:
assertion `left == right` failed
  left: Some(SeqNo(254))
 right: Some(SeqNo(257))

antiguru avatar Oct 10 '25 15:10 antiguru

Tell me if you want any help with setting up FoundationDB in mzcompose. I'm very interested to see benchmark results, as well as the limits test to find new limits (with its artificial limits because of things becoming too slow removed).

def- avatar Oct 13 '25 09:10 def-

I added a testdrive variant that runs against FoundationDB, but it requires a bunch of changes to make Mz compile in docker. One issue is that the FoundationDB client library needs to be dynamically linked, which is a novel problem for us. At the moment, everything in Materialize is statically linked, so we need to make sure that our base images contain the right library for the compile and runtime to be happy.

antiguru avatar Oct 14 '25 08:10 antiguru

Feature Benchmark looks pretty similar, ~2% slower on average: https://docs.google.com/spreadsheets/d/1iC-gxHKOgz-kkQDKgsq_aem-Y_8cVMZaeBDqT1sR5JE/edit?gid=2146535294#gid=2146535294 But it doesn't benchmark DDLs mostly. Scalability Benchmark showed a few slight regressions: https://buildkite.com/materialize/nightly/builds/13764 Parallel Benchmark has INSERTs being slower: https://buildkite.com/materialize/nightly/builds/13766 The limits test has some Pg connections being closed for unknown reason: https://buildkite.com/materialize/release-qualification/builds/969, maybe the rest will have some interesting results for whether we can have more objects using FDB

def- avatar Oct 16 '25 13:10 def-

The feature benchmark had a few small regressions in SmallInserts and Subscribes: https://buildkite.com/materialize/nightly/builds/13768

def- avatar Oct 16 '25 22:10 def-

When enabling FoundationDB consensus in Parallel Workload with 10x the number of objects (to stress it a bit more), I'm seeing a novel panic: Parallel Workload (0dt deploy)

parallel-workload-materialized-1     | thread 'tokio:work-2' panicked at /var/lib/buildkite-agent/builds/buildkite-l-builders-x86-64-static-4e3f139-i-0d002e61edf47c1a8-1/materialize/test/src/storage-controller/src/collection_mgmt.rs:1186:21: error truncating metrics history: appending retractions: UpperMismatch { expected: Antichain { elements: [1761082474918] }, current: Antichain { elements: [1761082475918] } } (type=WallclockLagHistory)

Could it be related to FoundationDB? If not I'll open a separate issue. Edit: I couldn't reproduce it without FoundationDB, tried in https://github.com/MaterializeInc/materialize/pull/33907 Edit2: I have opened an issue: https://github.com/MaterializeInc/database-issues/issues/9824

def- avatar Oct 22 '25 07:10 def-