Use serde instead of hand-rolled serialization, take 2
I figured out how to do cool git commands and set commit's author and co-author, so here we go. This basically supersedes #18 (I mean, it's literally a port of that PR).
So here's a comparison between e256bc60e99115fb2bbefb7a097ff527120404ee (baseline) and 8d42961f1e2d3473e4a3655d8a65848f670a4caa (serde):
$ hyperfine -N -w10 -m50 -- "./baseline -C llvm-project/build1 clang-format" "./serde -C llvm-project/build2 clang-format"
Benchmark 1: ./baseline -C llvm-project/build1 clang-format
Time (mean ± σ): 222.6 ms ± 10.6 ms [User: 191.7 ms, System: 30.5 ms]
Range (min … max): 208.6 ms … 241.4 ms 50 runs
Benchmark 2: ./serde -C llvm-project/build2 clang-format
Time (mean ± σ): 215.3 ms ± 6.7 ms [User: 187.8 ms, System: 27.1 ms]
Range (min … max): 208.8 ms … 236.2 ms 50 runs
Summary
./serde -C llvm-project/build2 clang-format ran
1.03 ± 0.06 times faster than ./baseline -C llvm-project/build1 clang-format
Byte counts (stripped binaries):
1252376 baseline
1395800 serde
Yes, the size still goes up, sadly.
But as I've already mentioned in https://github.com/evmar/n2/pull/18#issuecomment-1589893441, this allows to get rid of some probably-UB unsafe code like
unsafe { MaybeUninit::uninit().assume_init() },
from here.
I guess the package version should be bumped too. Totally forgot about that.
Huh, it's surprising we no longer see the performance difference from that previous change. Two guesses:
- maybe lto helped?
- what size is your
build1/.n2_dbfile? maybe it's too small for it to matter here?
maybe lto helped?
Almost certainly yes.
what size is your
build1/.n2_dbfile? maybe it's too small for it to matter here?
$ wc --bytes build{1,2}/.n2_db
430194 build1/.n2_db
442192 build2/.n2_db
Looking through the bincode docs, it's a little disappointing that even it uses big sizes like u64 for lengths, but it probably doesn't matter too much, and maybe using their varint encoding would end up a net win?
If you build the check-llvm target in LLVM, it builds a lot more files than clang-format. I think you can interrupt the build at any point (via ctl-C) and the n2 database should be ok (maybe?). That would let you generate a larger n2db for comparison purposes, though building LLVM takes quite a while.
At the point where my build is at as I type this:
[=========================--------- ] 2772/4459 done, 8/930 running
My .n2_db is 3.9M, which is large enough where we care more about performance.
A comparison between "baseline" e256bc60e99115fb2bbefb7a097ff527120404ee (manual parsing), "serde" 8d42961f1e2d3473e4a3655d8a65848f670a4caa (serde + cbor), and "bincode" 77e2d4a6c0f9c6106f55f8ce9bad6fb9550ceaa4 (serde + bincode):
$ hyperfine -iN -w10 -m25 -- "./baseline -C llvm-project/build1 xxx" "./serde -C llvm-project/build2 xxx" "./bincode -C llvm-project/build3 xxx"
Benchmark 1: ./baseline -C llvm-project/build1 xxx
Time (mean ± σ): 162.2 ms ± 9.7 ms [User: 140.0 ms, System: 21.9 ms]
Range (min … max): 148.0 ms … 176.5 ms 25 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: ./serde -C llvm-project/build2 xxx
Time (mean ± σ): 179.4 ms ± 13.9 ms [User: 158.8 ms, System: 20.3 ms]
Range (min … max): 163.6 ms … 211.5 ms 25 runs
Warning: Ignoring non-zero exit code.
Benchmark 3: ./bincode -C llvm-project/build3 xxx
Time (mean ± σ): 152.9 ms ± 13.0 ms [User: 131.7 ms, System: 20.8 ms]
Range (min … max): 137.6 ms … 170.8 ms 25 runs
Warning: Ignoring non-zero exit code.
Summary
./bincode -C llvm-project/build3 xxx ran
1.06 ± 0.11 times faster than ./baseline -C llvm-project/build1 xxx
1.17 ± 0.14 times faster than ./serde -C llvm-project/build2 xxx
Database sizes (more or less the same content, so this drastic size difference is explained solely by the format)
$ wc --bytes llvm-project/build{1,2,3}/.n2_db
4082125 llvm-project/build1/.n2_db
3677224 llvm-project/build2/.n2_db
5362198 llvm-project/build3/.n2_db
Binary sizes (stripped, obviously).
$ wc --bytes baseline serde bincode
1252376 baseline
1395800 serde
1289528 bincode
serde + cbor seems to be faster than the manual implementation on small databases, but significantly slower on the bigger ones. It is also the largest binary (but the most compact databases).
serde + bincode beats the manual implementation on big DBs. Haven't tested on small ones. It is also only slightly larger than the baseline binary (unlike the cbor one), although its databases are significantly bigger.
I've rebased on top of main (e73378d693716715be1a420aa74a2836b49b85c8)
$ hyperfine -iN -w10 -m25 -- "./e73378d693716715be1a420aa74a2836b49b85c8 -C llvm-project/build1 xxx" "./63b1036df01098b1483797c3c9201fdb3cab6a6f -C llvm-project/build3 xxx"
Benchmark 1: ./e73378d693716715be1a420aa74a2836b49b85c8 -C llvm-project/build1 xxx
Time (mean ± σ): 170.2 ms ± 11.7 ms [User: 148.7 ms, System: 21.2 ms]
Range (min … max): 150.6 ms … 187.7 ms 25 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: ./63b1036df01098b1483797c3c9201fdb3cab6a6f -C llvm-project/build3 xxx
Time (mean ± σ): 161.5 ms ± 10.2 ms [User: 138.3 ms, System: 22.7 ms]
Range (min … max): 147.9 ms … 179.8 ms 25 runs
Warning: Ignoring non-zero exit code.
Summary
./63b1036df01098b1483797c3c9201fdb3cab6a6f -C llvm-project/build3 xxx ran
1.05 ± 0.10 times faster than ./e73378d693716715be1a420aa74a2836b49b85c8 -C llvm-project/build1 xxx
Based on this branch I created a PR that has just the serde parts without the refactoring: https://github.com/evmar/n2/pull/64/files
Some random thoughts:
- I don't love how this adds random copies in the write path due to DbEntry holding an owned String, but in general the write path isn't perf-critical
- using a varint encoding would be more efficient for references to ids (which are generally small) but less efficient for hashes (which we know are always the full 64 bits), but I don't think you can control these separately in bincode. This means a quarter of the .n2db file I just looked at was 0 bytes.
It's pretty interesting to me how the perf is a wash, given that the serde codepath appears to doing a lot more (creating intermediate vecs and strings etc.).
* I don't love how this adds random copies in the write path due to DbEntry holding an owned String, but in general the write path isn't perf-critical
Maybe this can be combated with Cow<str>, but I'm not sure.
It's pretty interesting to me how the perf is a wash, given that the serde codepath appears to doing a lot more (creating intermediate vecs and strings etc.).
My completely arbitrary (and probably wrong) guess would be that it's actually faster to store the values with all the excess zeroes without compression, because then it saves time because we no longer need to do bit shifts and maybe also because the read/writes are now more aligned? Maybe?