bbolt icon indicating copy to clipboard operation
bbolt copied to clipboard

Improve startup time by reading DB file in larger chunks

Open C-Otto opened this issue 3 years ago • 2 comments

Related to #86 which has been closed (by accident?) and https://github.com/lightningnetwork/lnd/issues/6059.

When using bbolt as part of https://github.com/lightningnetwork/lnd/ to open a ~10 GByte DB file, the file is read with around 15 MByte/sec. If I read the file using cat (so that bbolt can re-read the same data from RAM), the startup time is reduced by a factor of around 10 (i.e., bbolt is able to process the file with > 100 MByte/sec instead of just ~15 MByte/sec).

My assumption is that bbolt issues many smaller requests ("read 1 byte at position Y"), possibly at random locations (which doesn't matter too much with SSDs). If this assumption is true, it might help to read larger chunks (buffering, read-ahead): "read 10 MByte starting from position Y" followed by the existing code accessing smaller chunks.

C-Otto avatar Dec 07 '21 17:12 C-Otto

Bbolt does a lot of random reads, but it's fetching pages of size of 4KB -> 64KB (depending on os config/arch).

As most cloud infrastructures and even local drives work with bigger read-ahead buffers, the cost of jumping might be significant.

I would explore code around 'mmap' -> on some architectures there might be flags to let mmap proactively load the files... on the other vectorized IO might help (similarly to: https://github.com/etcd-io/bbolt/pull/339).

ptabor avatar Dec 30 '22 13:12 ptabor

@C-Otto If you're on Linux you can pass this flag to have the mmaped region populated by OS beforehand.

bolt.Options{
	MmapFlags: syscall.MAP_POPULATE,
}

@ptabor Does disabling readahead make sense?

unix.Fadvise(int(f.Fd()), 0, 0, unix.FADV_RANDOM)

cenkalti avatar May 16 '23 16:05 cenkalti