valkey
valkey copied to clipboard
CRC64 perf improvements from Redis patches
- 53-73% faster on Xeon 2670 v0 @ 2.6ghz
- 2-2.5x faster on Core i3 8130U @ 2.2 ghz
- 1.6-2.46 bytes/cycle on i3 8130U
- likely >2x faster than crcspeed on newer CPUs with more resources than a 2012-era Xeon 2670
- crc64 combine function runs in <50 nanoseconds typical with vector + cache optimizations (~8 microseconds without vector optimizations, ~80 *microseconds without cache, the combination is extra effective)
- still single-threaded
- valkey-server test crc64 --help (requires
make distclean && make SERVER_TEST=yes
)
What do we use crc64 for? RDB files?
I want to understand the practical benefit.
What do we use crc64 for? RDB files?
RDB and slot migration. (Since the dump + restore includes the CRC checksumming). I mentioned before improving crc16 would be a bigger win for cluster mode.
OK. It would be impossible to change to SHA1 or something else for RDB, DUMP/RESTORE, right? Because of backwards compatibility.
I think we should merge this then. It's more complex code, but as long as it's well tested and isolated in its own files, that's no problem to me.
I found your blog post. :smiley: https://www.dr-josiah.com/2023/04/crc64-in-redis-is-getting-even-faster.html
When producing or loading a large RDB dump, how much of the CPU time is CRC64 computation and how much is other stuff?
Let me rephrase your question; why is doing CRC64 important for Redis?
Before loading a snapshot from disk, or received over the wire from a master server, the loading server will take a pass over the whole RDB file, and call out roughly once every 2 megabytes by default (see loading_process_events_interval_bytes
). These changes save ~.5 seconds per gigabyte of snapshot to be loaded.
When creating RDB snapshots, any data segment written gets a call to CRC64 for up to 2 megabytes by default (also uses loading_process_events_interval_bytes
), but typically closer to whatever the data storage size. So maybe tens of bytes at present.
That said, with additional small changes, you can update on write the same 2 meg segment all the time (instead of at maximum), and the effects of this (along with eliminating fork overhead) can be seen in 20 seconds of reduced startup / snapshot time for a 10 gig resident dataset from the last slide here: https://www.slideshare.net/secret/sLTnWqcPV7G0HK , of which ~8-10 seconds can be attributed to CRC64 performance improvements.
Sounds good. I'm convinced. :)
FYI: You need Signed-off-by also for commits done using suggestions in the web gui. You can add that using git rebase --signoff HEAD~1
and then force-push. (Replace 1 with N to add signoff to the last N commits.)
Codecov Report
Attention: Patch coverage is 96.96970%
with 4 lines
in your changes are missing coverage. Please review.
Project coverage is 68.48%. Comparing base (
a989ee5
) to head (f5c0509
). Report is 26 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #350 +/- ##
============================================
+ Coverage 68.37% 68.48% +0.10%
============================================
Files 108 109 +1
Lines 61555 61669 +114
============================================
+ Hits 42091 42235 +144
+ Misses 19464 19434 -30
Files | Coverage Δ | |
---|---|---|
src/crc64.c | 100.00% <100.00%> (ø) |
|
src/crccombine.c | 100.00% <100.00%> (ø) |
|
src/crcspeed.c | 37.97% <90.47%> (+9.18%) |
:arrow_up: |
Assuming tests pass, I'm happy with it.
Btw, I just want to test it on another arm machine. I'll do that tomorrow and merge it if the performance makes sense.