goleveldb
goleveldb copied to clipboard
Speed up memory db by using fixed-size keys in skiplist search
This PR is a follow-up to https://github.com/syndtr/goleveldb/pull/385, and only the last two commits are "this PR".
The first "this PR" commit contains a small command-line utility to run a set of Put
-operations, and export the times in a json-format. The json format can be made into graphs via this little tool: https://github.com/holiman/mkchart . Note, the stats
package can be dropped, I just included it in this PR so other people can repro my results, and try out other combos.
The last commit contains the actual change.
Problem
Let's take a look at the graphs for storing 4M key/value pairs. In every example, the key is 32 byte
. The graphs show value size of 32
, 96
and 256
byte, respectively.
As can be seen, when the values become larger, the Put
operation becomes slower. This is due to the skiplist search -- for every Put, we need to find the correct skiplist position. This entails iterating the nodeData
structure, but also accessing the kvData
structure to do the key-comparisons.
With larger kvData
, we get less benefits from various caches, and the memory accesses slow the whole thing down.
Solution
This PR frees up one field in the skiplist node
, -- by packing the height
element into the keySize
. The uint64
which previously held height
instead now holds 8 bytes of the key.
During the search, instead of loading the full key, we can do a quickCmp
using the data we've already loaded.
( note: this trick can only be performed if the memdb is configured to use the DefaultComparer
, otherwise it falls back to using the configured comparer )
With this feature enabled, here are the new charts:
Impact
The total runtimes for the three examples are as follows
Key/value combo | Master | PR |
---|---|---|
32:32 |
17.93s |
10.85s |
32:96 |
18.17s |
11.83s |
32:256 |
24.58s |
13.30s |
Note for 32-bit platforms
The PR as written assumes 64-bit platform, but can easily be extended to support 32-bit. In that case, we could use 4
bytes instead of 8
for the quickCmp
. If we assume that 1
byte of a key is application-specific prefix, that leaves 3
bytes of entropy, so even that should be 'ok' for up to 16M
items -- which is quite a lot for a memory db.
I have experimented with a more niche variant, where the node
is based on uint32
, and 6
bytes are used. That can be achieved if the keyLength is limited to 4096
, the height
uses only one nibble, and, of course, that the kvOffset
is limited to 32
bits (which is currently already the case for 32-bit
platforms). However, this PR does not make any niche-specific changes to go-leveldb which would not fit the general case.
Todos not fixed yet:
- [x] Fix this for
32-bit
platforms - [ ] Use the same
quickCmp
forfindLT
, and possiblySeek
andfill
.
Note about benchmarks
A final note: the existing benchmarks do not quite demonstrate this issue, nor do they show any significant speed improvement in this PR (at least not the Put
tests). The reason for that is twofold:
- The existing benchmarks uses a 4-byte key, meaning that each lookup requires some alloc for padding the key,
- The existing benchmarks uses
nil
value, meaning that they do not suffer (much) from reading data from thekvData
section -- which is essentially just a tighly packed space of keys.
For posterity though:
name old time/op new time/op delta
Put-6 1.05µs ± 8% 1.03µs ±31% ~ (p=0.548 n=5+5)
PutRandom-6 2.08µs ± 8% 1.77µs ±27% ~ (p=0.095 n=5+5)
Get-6 1.08µs ±10% 0.93µs ± 6% -14.35% (p=0.008 n=5+5)
GetRandom-6 2.63µs ± 5% 2.03µs ± 7% -22.71% (p=0.008 n=5+5)
After reading the commit https://github.com/syndtr/goleveldb/pull/387/commits/dae8fc2f18ab43d22e26e60b12d292a9743a57df, I guess the main improvement is gained from skipping most of key loading and byte comparison(by using integer comparison)?
skipping most of key loading and byte comparison(by using integer comparison)
Primarily avoiding loading -- by minimizing the memory surface area that we have to jump around on during the search