vault icon indicating copy to clipboard operation
vault copied to clipboard

Upgrade go-msgpack to v2 2.1.1

Open swenson opened this issue 2 years ago • 7 comments

And set the time.Time option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in go.mod.

v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See the release notes for go-msgkack 2.1.0 for more details.

I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right time.Time option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them.

This relies on

  • [x] https://github.com/hashicorp/raft-boltdb/pull/38
  • [x] https://github.com/hashicorp/raft/pull/577

and will need to be updated after they are merged to get the go.mod fixes removed.

swenson avatar Oct 19 '23 21:10 swenson

Build Results: All builds succeeded! :white_check_mark:

github-actions[bot] avatar Oct 19 '23 21:10 github-actions[bot]

Thanks for your diligent work on this painful bit of tech debt @swenson ❤️ .

There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it.

I wonder if we should find a way to quantify this specifically for Vault's usage. I think we only use msgpack for the Raft wire transport (i.e. the thin bit of RPC framing around each Raft RPC with actual log entries already protobuf encoded in the byte slice) and similar log framing for the LogStore (i.e. bit of msgpack boilerplate around each log entries proto bytes to encode the other metadata like index/term/appendedAt time etc.

So it should be relatively easy to write a Benchmark test in our physical/raft package that exercises both network and disk storage and the compare results of several runs with main (plus the new benchmark) and this branch. My guess is that it will not be measurably different for each write because the time will be dominated by disk fsyncs and especially on Mac laptops that is super slow - way slower than anything we are likely to measure in encoding a simple object per log committed. I think the quickest way to get around that rather than hacking the code would be to run the benchmark with a directory where a RAMDisk is mounted so the "fsync" is actually just memory updates. This is not "realistic" in that noone should ever run like this in prod, but it acts as a "worst case" overhead - some customers use NVME drives that are much closer to RAM speed than Mac SSD speed for fsyncs so it would be more representative for them! FWIW I still expect the difference to be negligible, but I think this would be a way to confirm that hunch!

This is more me musing aloud that requesting you do all this work! If you'd like to I think it would be valuable but we can also probably have someone else try it!

banks avatar Oct 20 '23 11:10 banks

Thanks @banks. Time-permitting, I can try to write a benchmark this week.

swenson avatar Oct 23 '23 17:10 swenson

CI Results: All Go tests succeeded! :white_check_mark:

github-actions[bot] avatar Oct 25 '23 18:10 github-actions[bot]

@banks I added a benchmark in 02f613b2832ff73b8ba6f4c6fb4122fce3006e1e

tl;dr there was no performance difference in the benchmark

Though I am suspicious as to why Linux is 10x faster than macOS -- I thought the fsync issue was that macOS didn't do them properly and was supposed to be faster? 🤔

goos: darwin
goarch: arm64
pkg: github.com/hashicorp/vault/physical/raft
                   │    a.txt    │            b.txt             │
                   │   sec/op    │   sec/op     vs base         │
RaftWithNetwork-10   58.65m ± 2%   58.62m ± 2%  ~ (p=0.937 n=6)
goos: linux
goarch: amd64
pkg: github.com/hashicorp/vault/physical/raft
cpu: AMD Ryzen Threadripper 3970X 32-Core Processor
                   │    c.txt    │            d.txt             │
                   │   sec/op    │   sec/op     vs base         │
RaftWithNetwork-64   5.861m ± 1%   5.837m ± 0%  ~ (p=0.240 n=6)

swenson avatar Oct 26 '23 19:10 swenson

Amazing 🤩 Thanks Christopher. Great to confirm that.

Though I am suspicious as to why Linux is 10x faster than macOS -- I thought the fsync issue was that macOS didn't do them properly and was supposed to be faster?

It's a long answer but... since Go 1.12 File.Sync has actually forced MacOS to flush to disk using an fcntl. I've seen Darwin authors claim that it's now doing the exact same thing that linux (since whichever kernel version made fsync actually work again) does, but it's well known that Macs are now very much slower at fsyncs than linux machines. Some speculate that it's because Apple SSDs are specifically designed for non-sync throughput and so the hardware/firmware is actually slower than the hardware/firmware in otherwise comparable SSD devices that are in linux systems. Some speculate that the fcntl MacOS requires to force hardware flush is even more expensive than whatever it is that linux kernel sends to a device on fsync. I've not been able to find a reliable source that clears that up and I've never dual-booted my Mac to figure out if the mac hardware actually performs differently under linux. Folks sometimes think they see this when fsync is much faster in a VM or Docker container on their mac but that is probably totally different - in that case the virtualised filesystem is probably just not doing fsync at all again! [One source that is a place to get deeper in the rabbit hole]

But tl;dr, for reason I'm still fuzzy on despite researching a few times, Macs are typically much slower to Fsync if you actually force the flush properly (since the fsync system call alone doesn't) and Go since 1.12 always does.

banks avatar Oct 30 '23 14:10 banks

Thanks everyone. I updated this to remove the extra tests and related code.

swenson avatar Dec 22 '23 18:12 swenson

Thanks!

swenson avatar Jan 08 '24 18:01 swenson