mina icon indicating copy to clipboard operation
mina copied to clipboard

Implement DB benchmark

Open georgeee opened this issue 3 months ago â€Ē 2 comments

This PR implements a benchmark for DB usage.

It independently measures read and write performance of every DB imnplementation. Allows to make informed decisions for various flows of working with data, keeping measurement of pure DB performance separate from performance of other subsystems (including serialization).

See db_benchmark/README.md for more details about the benchmark.

Explain how you tested your changes:

  • Executed the benchmark successfully

Checklist:

  • [x] Dependency versions are unchanged
    • Notify Velocity team if dependencies must change in CI
  • [x] Modified the current draft of release notes with details on what is completed or incomplete within this project
  • [x] Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • [x] Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • [x] All tests pass (CI will check this if you didn't)
  • [x] Serialized types are in stable-versioned modules
  • [x] Does this close issues? None

georgeee avatar Nov 15 '25 10:11 georgeee

Result of the run with default parameters

Name Time/Run mWd/Run mjWd/Run Prom/Run Percentage
rocksdb_write 705_955.63us 158_922.00w 482.99w 482.99w 26.18%
rocksdb_read 125.97us 1_935.00w 16_399.56w 13.56w  
lmdb_write 2_696_045.27us 10_047.00w 12.96w 12.96w 100.00%
lmdb_read 96.56us 1_217.00w 16_387.19w 1.19w  
single_file_write 70_465.27us 19_047.38w 324.69w 324.69w 2.61%
single_file_read 362.82us 1_273.00w 73_738.82w 2.82w 0.01%
multi_file_write 83_925.40us 559.00w 2_048_384.01w 382.01w 3.11%
multi_file_read 184.54us 1_269.00w 32_772.38w 0.38w  

📊 Benchmark Analysis: Write vs Read Performance

Test Configuration:

  • ðŸ“Ķ Keys per block: 125
  • ðŸ’ū Value size: 131,072 bytes (128 KB)
  • ðŸ”Ē Blocks in DB: 800
  • Total data: ~100,000 keys, ~12.8 GB total

✍ïļ Write Performance Comparison

Speed (Time/Run - lower is better):

  1. ðŸĨ‡ single_file_write: ~70ms - fastest option
  2. ðŸĨˆ multi_file_write: ~84ms - very close second
  3. ⚠ïļ rocksdb_write: ~706ms - 10x slower than single file
  4. ðŸ”ī lmdb_write: ~2,696ms - significantly slower (38x slower than single file)

Memory Allocation:

  • 💚 multi_file_write: 559w - minimal minor heap allocation
  • ⚠ïļ rocksdb_write: 158,922w - high memory allocation
  • ðŸ”ī multi_file_write (mjWd): 2,048,384w - very high major heap pressure from file operations

📖 Read Performance Comparison

Speed (Time/Run - all very fast):

  1. ðŸĨ‡ lmdb_read: ~97Ξs - fastest
  2. ðŸĨˆ rocksdb_read: ~126Ξs - nearly identical
  3. ✅ multi_file_read: ~185ξs - still excellent
  4. ✅ single_file_read: ~363ξs - slowest but still sub-millisecond

All read operations are extremely fast (microsecond range vs millisecond writes).

ðŸŽŊ Key Takeaways

✅ LMDB: Terrible write performance (~2.7s per operation) but excellent read speed
✅ Simple file I/O: Best write performance by far - ideal for large value storage
✅ RocksDB: Balanced middle-ground but high memory usage on writes
✅ Large values (128 KB): Simple file approaches dominate for write throughput

ðŸ’Ą Recommendation: For large-value workloads like this (128 KB per value):

  • Write-heavy → single_file_write is the clear winner
  • Read-heavy → LMDB or RocksDB provide faster lookups

Update after optimization of multi-file writing

📊 multi_file_write Benchmark Results

Name Time/Run mWd/Run mjWd/Run Prom/Run Percentage
multi_file_write 45.50ms 553.00w 24.03w 24.03w 100.00%

📈 Before vs After Comparison

Performance Gains:

  • ⏱ïļ Time: 83,925Ξs → 45,500Ξs (45.5ms)
  • 📉 Speedup: 1.84x faster 🎉
  • ðŸ’ū Memory (mWd): 559w → 553w (essentially unchanged)
  • 🔄 Major heap (mjWd): 2,048,384w → 24.03w
  • ⚡ Major heap reduction: 99.999% reduction! ðŸ”Ĩ

🏆 Updated Write Performance Rankings

  1. ðŸĨ‡ multi_file_write (new): ~45.5ms - NEW CHAMPION
  2. ðŸĨˆ single_file_write: ~70ms (1.54x slower)
  3. ⚠ïļ rocksdb_write: ~706ms (15.5x slower)
  4. ðŸ”ī lmdb_write: ~2,696ms (59x slower)

ðŸ’Ą What Changed?

The massive mjWd reduction (from 2M+ to 24w) suggests you eliminated file system churn or excessive allocations. This is a textbook example of optimization - you kept the speed advantage while making it vastly more GC-friendly.

New recommendation: For large-value (128 KB) write workloads, multi_file_write is now the clear winner - fastest write speed AND minimal heap pressure. ðŸŽŊ

georgeee avatar Nov 15 '25 10:11 georgeee

🚀 Smaller Values, Different Story

📊 Full Benchmark Results (New Parameters)

Test Configuration:

  • ðŸ“Ķ Keys per block: 32
  • ðŸ’ū Value size: 9,000 bytes (8.8 KB)
  • ðŸ”Ē Warmup blocks: 1,000
  • Total warmup: 32,000 keys
Name Time/Run mWd/Run mjWd/Run Prom/Run Percentage
rocksdb_write 4,033.82us 40,708.00w 20.57w 20.57w 0.42%
rocksdb_read 40.49us 1,924.00w 1,140.63w 13.63w -
lmdb_write 957,662.23us 2,596.00w 0.65w 0.65w 100.00%
lmdb_read 35.66us 1,206.00w 1,127.31w 0.31w -
single_file_write 1,957.77us 4,900.00w 38.59w 38.59w 0.20%
single_file_read 70.29us 1,262.00w 9,321.10w 0.10w -
multi_file_write 501.43us 269.00w 36,014.88w 12.88w 0.05%
multi_file_read 55.97us 1,258.00w 2,254.03w - -

Before vs After:

  • ⏱ïļ Time: 501.43us → 839.77us (1.67x slower) ⚠ïļ
  • ðŸ’ū Memory (mWd): 269w → 274w (essentially unchanged)
  • 🔄 Major heap (mjWd): 36,014.88w → 6.49w
  • ⚡ Major heap reduction: 99.98% reduction! ðŸ”Ĩ

🏆 Write Performance Rankings (8.8 KB values)

  1. ðŸĨ‡ multi_file_write (original): ~501us - fastest
  2. ðŸĨˆ multi_file_write (optimized): ~840us - better GC behavior
  3. ðŸĨ‰ single_file_write: ~1,958us
  4. ⚠ïļ rocksdb_write: ~4,034us
  5. ðŸ”ī lmdb_write: ~957,662us - still very slow

📖 Read Performance Rankings

  1. ðŸĨ‡ lmdb_read: ~36us - fastest
  2. ðŸĨˆ rocksdb_read: ~40us
  3. ✅ multi_file_read: ~56us
  4. ✅ single_file_read: ~70us

ðŸ’Ą Key Observations

Compared to 128 KB value test:

  • 📉 All operations are significantly faster with smaller values (8.8 KB vs 128 KB)
  • 🔄 Trade-off emerged: Optimization reduced mjWd by 99.98% but slowed writes by 1.67x
  • ðŸŽŊ RocksDB becomes competitive at smaller value sizes (~4ms vs 706ms previously)
  • ⚡ LMDB still struggles with writes but dominates reads

Optimization trade-off: The optimized version trades some speed for much better GC behavior. Depending on workload (GC pressure vs raw throughput), either version could be preferable.

georgeee avatar Nov 15 '25 11:11 georgeee