flow-dps icon indicating copy to clipboard operation
flow-dps copied to clipboard

Peformance: Tweak badger options

Open Ullaakut opened this issue 4 years ago • 2 comments

Description

// DefaultOptions returns the default Badger options preferred by the DPS for its index database.
func DefaultOptions(dir string) badger.Options {
	return badger.DefaultOptions(dir).
		WithMaxTableSize(256 << 20).
		WithValueLogFileSize(64 << 20).
		WithTableLoadingMode(options.FileIO).
		WithValueLogLoadingMode(options.FileIO).
		WithNumMemtables(1).
		WithKeepL0InMemory(false).
		WithCompactL0OnClose(false).
		WithNumLevelZeroTables(1).
		WithNumLevelZeroTablesStall(2).
		WithLoadBloomsOnOpen(false).
		WithIndexCacheSize(2000 << 20).
		WithBlockCacheSize(0).
		WithLogger(nil)
}

Try to tweak these options to ideally use almost exactly 128GB and increase performance.

Ullaakut avatar Oct 08 '21 08:10 Ullaakut

Options and their performance impact

These tests have been ran using a localnet dataset of a few gigabytes. A baseline indexing on my machine takes, on average, 1mn46s to complete. Some of the results of this benchmark might not be true for larger datasets, but unfortunately this is difficult for me to test since it takes days to index a complete spork.

  • WithTableLoadingMode(options.MemoryMap)/WithValueLogLoadingMode(options.MemoryMap): 136% performance improvement
  • WithSyncWrites(false): Negligible/no impact.
  • WithMaxTableSize(2000 << 20): Negligible/no impact.
  • WithValueLogFileSize(2000 << 20): Negligible/no impact.
  • WithDetectConflicts(false): Negligible/no impact.
  • WithBlockSize(10MB): Negligible/no impact.
  • WithBloomFalsePositive(0): Negligible/no impact.
  • WithNumCompactors(16): Negligible/no impact.
  • WithMaxLevels(10): Performance decrease with more levels, no noticeable improvement with less levels.
  • WithNumMemtables(256): Slight negligible performance decrease.
  • WithKeepL0InMemory(true): Negligible/no impact.
  • WithBypassLockGuard(true): Negligible/no impact.

It seems like the only option that produces noticeable positive performance improvement is having the TableLoadingMode set to its default value, options.MemoryMap. I will need to double check however, whether this is also the case with a real life data sample. Maybe it is more performant on a short run with localnet data, but would have the opposite effect with real data.

Ullaakut avatar Oct 11 '21 07:10 Ullaakut

Unfortunately I'm unable to test it with real data at the moment since my machine does not have enough RAM to run the live indexer, and the remote machine I have access to has no storage left.

EDIT 25/10: Will be able to test that today or tomorrow.

Ullaakut avatar Oct 14 '21 08:10 Ullaakut