Glean icon indicating copy to clipboard operation
Glean copied to clipboard

[lmdb #3] Add LMDB storage backend

Open simonmar opened this issue 1 month ago • 7 comments

This commit adds an LMDB-based storage backend.

Why?

  • It's faster than RocksDB (see below)
  • It has no dependencies (all the code is bundled, just a couple of .c files).
  • No slow compaction stage during DB creation
  • Backup is just copying the DB, so restore can be instantaneous once the DB is downloaded.
  • Size of the on-disk DB may actually be smaller than RocksDB, but needs more experiments

How do you use it?

Adding --lmdb to the glean or glean-server command line tells it to use LMDB for any DBs created. If this option is given, Glean will still be able to work with existing RocksDB DBs, including restoring them from remote storage. (and vice-versa, if --lmdb is not given, then restoring LMDB DBs from remote storage is still supported).

How does it perform?

Testing indexing of Stackage using the Haskell indexer.

** index

lmdb: 189s, 2.6GB
rocksdb: 216s, 0.9GB

** backup:

lmdb: 5.7s, 0.8GB
rocksdb: 1.6s 0.9GB

** restore:

lmdb: 2.6s 2.1GB
rocksdb: 2.15s 0.9GB

Note that backup with LMDB just copies the DB and compresses it, most of the time is spent in compression, which can be tuned to be either faster / less compression or slower / more compression. Currently using zstd level 8 which seems a reasonable compromise.

Also note that backup produces a squashfs which can be mounted directly (instantaneous restore with decompression on the fly). See #669

C++ Indexing

I indexed all of LLVM + Clang. This is the time to write the data to the DB and do deriving (since otherwise the indexing time is dominated by the C++ indexer itself):

  • lmdb: 30.8s 754MB
  • rocksdb: 52.4s 256MB

Backup:

  • lmdb: 1.8s 213MB
  • rocksdb: 0.8s 252MB

Restore:

  • lmdb: 1.14s 213MB (using db_lmdb_restore_unpack = false, see #669)
  • rocksdb: 1.14s 252MB

Benchmarking with query-bench

* RocksDB (9 Jan)

benchmarking complex-parallel
time                 4.722 s    (4.188 s .. 5.182 s)
                     0.998 R²   (0.998 R² .. 1.000 R²)
mean                 5.051 s    (4.859 s .. 5.403 s)
std dev              341.9 ms   (16.30 ms .. 415.4 ms)
variance introduced by outliers: 20% (moderately inflated)

benchmarking simple
time                 35.87 μs   (35.32 μs .. 36.95 μs)
                     0.989 R²   (0.978 R² .. 0.995 R²)
mean                 41.35 μs   (39.91 μs .. 42.80 μs)
std dev              5.186 μs   (4.647 μs .. 6.200 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking nested
time                 55.91 μs   (55.55 μs .. 56.57 μs)
                     0.993 R²   (0.981 R² .. 1.000 R²)
mean                 57.59 μs   (56.00 μs .. 61.65 μs)
std dev              7.433 μs   (1.047 μs .. 13.81 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking page
time                 21.88 ms   (20.42 ms .. 23.41 ms)
                     0.952 R²   (0.876 R² .. 0.991 R²)
mean                 22.14 ms   (21.08 ms .. 24.16 ms)
std dev              3.260 ms   (1.508 ms .. 5.239 ms)
variance introduced by outliers: 63% (severely inflated)

benchmarking pagenested
time                 43.15 ms   (41.51 ms .. 45.74 ms)
                     0.991 R²   (0.975 R² .. 1.000 R²)
mean                 42.00 ms   (41.28 ms .. 43.37 ms)
std dev              1.910 ms   (749.0 μs .. 2.976 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking compile
time                 328.2 μs   (326.4 μs .. 330.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 327.3 μs   (326.6 μs .. 328.4 μs)
std dev              2.961 μs   (2.361 μs .. 4.059 μs)

benchmarking compile2
time                 95.93 μs   (95.46 μs .. 96.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 96.06 μs   (95.72 μs .. 96.69 μs)
std dev              1.519 μs   (865.7 ns .. 2.440 μs)

benchmarking compile3
time                 48.67 μs   (48.44 μs .. 48.90 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 48.87 μs   (48.71 μs .. 49.12 μs)
std dev              670.7 ns   (545.1 ns .. 856.4 ns)

benchmarking array_prefix
time                 306.2 ms   (274.9 ms .. 337.6 ms)
                     0.997 R²   (0.987 R² .. 1.000 R²)
mean                 310.7 ms   (306.6 ms .. 321.7 ms)
std dev              8.283 ms   (233.8 μs .. 10.44 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking stacked/page
time                 23.02 ms   (21.20 ms .. 24.57 ms)
                     0.977 R²   (0.956 R² .. 0.992 R²)
mean                 21.80 ms   (20.96 ms .. 22.68 ms)
std dev              1.934 ms   (1.618 ms .. 2.536 ms)
variance introduced by outliers: 38% (moderately inflated)

benchmarking stacked/pagenested
time                 41.75 ms   (40.69 ms .. 43.81 ms)
                     0.995 R²   (0.988 R² .. 0.999 R²)
mean                 41.74 ms   (41.10 ms .. 42.64 ms)
std dev              1.536 ms   (1.034 ms .. 1.913 ms)

* LMDB (9 Jan)

benchmarking complex-parallel
time                 3.060 s    (2.493 s .. 3.351 s)
                     0.996 R²   (0.991 R² .. 1.000 R²)
mean                 3.680 s    (3.401 s .. 4.208 s)
std dev              516.3 ms   (6.608 ms .. 613.0 ms)
variance introduced by outliers: 24% (moderately inflated)

benchmarking simple
time                 31.78 μs   (31.26 μs .. 32.59 μs)
                     0.994 R²   (0.992 R² .. 0.996 R²)
mean                 35.66 μs   (34.61 μs .. 36.79 μs)
std dev              3.901 μs   (3.329 μs .. 4.924 μs)
variance introduced by outliers: 86% (severely inflated)

benchmarking nested
time                 49.22 μs   (48.66 μs .. 50.16 μs)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 50.49 μs   (49.64 μs .. 51.86 μs)
std dev              3.597 μs   (2.639 μs .. 4.710 μs)
variance introduced by outliers: 71% (severely inflated)

benchmarking page
time                 15.30 ms   (14.10 ms .. 16.63 ms)
                     0.963 R²   (0.933 R² .. 0.983 R²)
mean                 15.70 ms   (15.11 ms .. 16.45 ms)
std dev              1.653 ms   (1.265 ms .. 2.442 ms)
variance introduced by outliers: 52% (severely inflated)

benchmarking pagenested
time                 24.94 ms   (22.29 ms .. 27.82 ms)
                     0.947 R²   (0.903 R² .. 0.981 R²)
mean                 22.95 ms   (21.91 ms .. 24.36 ms)
std dev              2.825 ms   (2.143 ms .. 3.701 ms)
variance introduced by outliers: 53% (severely inflated)

benchmarking compile
time                 349.6 μs   (339.6 μs .. 363.0 μs)
                     0.991 R²   (0.987 R² .. 0.995 R²)
mean                 350.7 μs   (342.1 μs .. 360.2 μs)
std dev              29.73 μs   (25.58 μs .. 36.64 μs)
variance introduced by outliers: 72% (severely inflated)

benchmarking compile2
time                 97.12 μs   (95.42 μs .. 100.3 μs)
                     0.989 R²   (0.979 R² .. 0.998 R²)
mean                 101.6 μs   (98.86 μs .. 106.6 μs)
std dev              12.01 μs   (7.345 μs .. 21.75 μs)
variance introduced by outliers: 86% (severely inflated)

benchmarking compile3
time                 48.31 μs   (48.11 μs .. 48.53 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 48.79 μs   (48.34 μs .. 50.48 μs)
std dev              2.691 μs   (517.0 ns .. 5.640 μs)
variance introduced by outliers: 60% (severely inflated)

benchmarking array_prefix
time                 311.7 ms   (296.2 ms .. 346.3 ms)
                     0.996 R²   (0.975 R² .. 1.000 R²)
mean                 308.2 ms   (303.7 ms .. 319.8 ms)
std dev              8.794 ms   (1.644 ms .. 11.58 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking stacked/page
time                 19.68 ms   (17.41 ms .. 22.57 ms)
                     0.929 R²   (0.884 R² .. 0.970 R²)
mean                 18.42 ms   (17.20 ms .. 19.85 ms)
std dev              3.087 ms   (2.367 ms .. 4.283 ms)
variance introduced by outliers: 71% (severely inflated)

benchmarking stacked/pagenested
time                 21.98 ms   (20.72 ms .. 23.49 ms)
                     0.982 R²   (0.965 R² .. 0.991 R²)
mean                 23.61 ms   (22.42 ms .. 25.52 ms)
std dev              3.453 ms   (2.212 ms .. 5.199 ms)
variance introduced by outliers: 63% (severely inflated)

Glass

glass-democlient list on 1000 files in the Stackage DB.

rocksdb: 39s lmdb: 31s

Caveats

The on-disk size of the DBs is larger than RocksDB by about 2x, although the backed-up size is about the same. This is mainly because LMDB doesn't do compression itself, however backup includes compression and the backup can be mounted directly as a squashfs (although mounting isn't implemented yet - we need to think about what happens if the server is restarted). I ran the Glass benchmark with the DB mounted as a squashfs and it was 32s vs. 31s, so hardly any difference.

There's no way to control the cache size, LMDB just maps pages into memory and relies on the OS to manage memory usage. It remains to be seen how well this works for a production server with many large DBs.

Key storage

LMDB doesn't support large keys, so I've used a strategy in which only the first ~2KB of the key is indexed. Searching for a key using a prefix larger than this may involve a linear search, but I think this is reasonable: we should never be looking up a fact using a large key. We can document this as a performance gotcha, just like sorting of fields. Indeed, we could use this same strategy with RocksDB to avoid duplicating large keys in the DB, which could potentially save a lot of space.

simonmar avatar Jan 08 '26 12:01 simonmar