valkey CRC64 perf improvements from Redis patches

53-73% faster on Xeon 2670 v0 @ 2.6ghz
2-2.5x faster on Core i3 8130U @ 2.2 ghz
1.6-2.46 bytes/cycle on i3 8130U
likely >2x faster than crcspeed on newer CPUs with more resources than a 2012-era Xeon 2670
crc64 combine function runs in <50 nanoseconds typical with vector + cache optimizations (~8 microseconds without vector optimizations, ~80 *microseconds without cache, the combination is extra effective)
still single-threaded
valkey-server test crc64 --help (requires make distclean && make SERVER_TEST=yes)

Apr 22 '24 20:04 josiahcarlson

What do we use crc64 for? RDB files?

I want to understand the practical benefit.

Apr 24 '24 01:04 zuiderkwast

What do we use crc64 for? RDB files?

RDB and slot migration. (Since the dump + restore includes the CRC checksumming). I mentioned before improving crc16 would be a bigger win for cluster mode.

Apr 24 '24 05:04 madolson

OK. It would be impossible to change to SHA1 or something else for RDB, DUMP/RESTORE, right? Because of backwards compatibility.

I think we should merge this then. It's more complex code, but as long as it's well tested and isolated in its own files, that's no problem to me.

Apr 24 '24 08:04 zuiderkwast

I found your blog post. :smiley: https://www.dr-josiah.com/2023/04/crc64-in-redis-is-getting-even-faster.html

Apr 24 '24 09:04 zuiderkwast

When producing or loading a large RDB dump, how much of the CPU time is CRC64 computation and how much is other stuff?

Let me rephrase your question; why is doing CRC64 important for Redis?

Before loading a snapshot from disk, or received over the wire from a master server, the loading server will take a pass over the whole RDB file, and call out roughly once every 2 megabytes by default (see loading_process_events_interval_bytes). These changes save ~.5 seconds per gigabyte of snapshot to be loaded.

When creating RDB snapshots, any data segment written gets a call to CRC64 for up to 2 megabytes by default (also uses loading_process_events_interval_bytes), but typically closer to whatever the data storage size. So maybe tens of bytes at present.

That said, with additional small changes, you can update on write the same 2 meg segment all the time (instead of at maximum), and the effects of this (along with eliminating fork overhead) can be seen in 20 seconds of reduced startup / snapshot time for a 10 gig resident dataset from the last slide here: https://www.slideshare.net/secret/sLTnWqcPV7G0HK , of which ~8-10 seconds can be attributed to CRC64 performance improvements.

Apr 24 '24 20:04 josiahcarlson

Sounds good. I'm convinced. :)

FYI: You need Signed-off-by also for commits done using suggestions in the web gui. You can add that using git rebase --signoff HEAD~1 and then force-push. (Replace 1 with N to add signoff to the last N commits.)

Apr 24 '24 20:04 zuiderkwast

Codecov Report

Attention: Patch coverage is 96.96970% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 68.48%. Comparing base (a989ee5) to head (f5c0509). Report is 26 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #350      +/-   ##
============================================
+ Coverage     68.37%   68.48%   +0.10%     
============================================
  Files           108      109       +1     
  Lines         61555    61669     +114     
============================================
+ Hits          42091    42235     +144     
+ Misses        19464    19434      -30

Files	Coverage Δ
src/crc64.c	`100.00% <100.00%> (ø)`
src/crccombine.c	`100.00% <100.00%> (ø)`
src/crcspeed.c	`37.97% <90.47%> (+9.18%)`	:arrow_up:

... and 17 files with indirect coverage changes

Apr 24 '24 20:04 codecov[bot]

Assuming tests pass, I'm happy with it.

Apr 25 '24 23:04 madolson

Btw, I just want to test it on another arm machine. I'll do that tomorrow and merge it if the performance makes sense.

Apr 30 '24 05:04 madolson

valkey valkey copied to clipboard

CRC64 perf improvements from Redis patches

Codecov Report

valkey
valkey copied to clipboard