bbolt
bbolt copied to clipboard
Improve startup time by reading DB file in larger chunks
Related to #86 which has been closed (by accident?) and https://github.com/lightningnetwork/lnd/issues/6059.
When using bbolt as part of https://github.com/lightningnetwork/lnd/ to open a ~10 GByte DB file, the file is read with around 15 MByte/sec. If I read the file using cat
(so that bbolt can re-read the same data from RAM), the startup time is reduced by a factor of around 10 (i.e., bbolt is able to process the file with > 100 MByte/sec instead of just ~15 MByte/sec).
My assumption is that bbolt issues many smaller requests ("read 1 byte at position Y"), possibly at random locations (which doesn't matter too much with SSDs). If this assumption is true, it might help to read larger chunks (buffering, read-ahead): "read 10 MByte starting from position Y" followed by the existing code accessing smaller chunks.
Bbolt does a lot of random reads, but it's fetching pages of size of 4KB -> 64KB (depending on os config/arch).
As most cloud infrastructures and even local drives work with bigger read-ahead buffers, the cost of jumping might be significant.
I would explore code around 'mmap' -> on some architectures there might be flags to let mmap proactively load the files... on the other vectorized IO might help (similarly to: https://github.com/etcd-io/bbolt/pull/339).
@C-Otto If you're on Linux you can pass this flag to have the mmaped region populated by OS beforehand.
bolt.Options{
MmapFlags: syscall.MAP_POPULATE,
}
@ptabor Does disabling readahead make sense?
unix.Fadvise(int(f.Fd()), 0, 0, unix.FADV_RANDOM)