valkey
valkey copied to clipboard
Embed key into dict entry
This PR incorporates changes related to key embedding described in the https://github.com/redis/redis/issues/12216
With this change there will be no key pointer and embedded the key within the dictEntry. 1 byte is used for additional bookkeeping. Overall the saving would be 7 bytes.
Key changes:
New dict entry type introduced, which is now used as an entry for the main dictionary:
typedef struct {
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next; /* Next entry in the same hash bucket. */
unsigned char data[];
} embeddedDictEntry;
Two new functions been added to the dictType:
size_t dictSdsKeyToBytes(unsigned char *buf, const void *key, unsigned char *header_size);
size_t dictSdsKeyLen(const void *key);
Change is opt-in per dict type, hence sets, hashes and other types that are using dictionary are not impacted. With this change main dictionary now owns the data, so copy on insert in dbAdd is no longer needed.
Benchmarking needs to be performed.
Codecov Report
Attention: Patch coverage is 96.87500% with 3 lines in your changes missing coverage. Please review.
Project coverage is 70.32%. Comparing base (
7719dbb) to head (825c82f). Report is 10 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #541 +/- ##
============================================
+ Coverage 70.26% 70.32% +0.06%
============================================
Files 110 111 +1
Lines 60108 60286 +178
============================================
+ Hits 42234 42396 +162
- Misses 17874 17890 +16
| Files | Coverage Δ | |
|---|---|---|
| src/db.c | 88.39% <100.00%> (+0.06%) |
:arrow_up: |
| src/defrag.c | 88.24% <100.00%> (+0.68%) |
:arrow_up: |
| src/dict.c | 97.53% <100.00%> (+0.09%) |
:arrow_up: |
| src/kvstore.c | 96.17% <100.00%> (-0.59%) |
:arrow_down: |
| src/rdb.c | 75.80% <100.00%> (-0.09%) |
:arrow_down: |
| src/sds.c | 86.08% <100.00%> (+0.21%) |
:arrow_up: |
| src/sds.h | 78.68% <ø> (ø) |
|
| src/server.c | 88.57% <100.00%> (+0.01%) |
:arrow_up: |
| src/debug.c | 53.40% <0.00%> (ø) |
|
| src/object.c | 78.55% <50.00%> (-0.03%) |
:arrow_down: |
@valkey-io/core-team Could you provide some clarity on this ?
My thought captured under https://github.com/valkey-io/valkey/issues/394#issuecomment-2128139503
We've made changes in the past which can benefit the users in a short term and have redone the implementation in the following major/minor version. key embedding in dictEntry is a pretty small change and we can easily get rid of it with dictEntry removal. I feel the small gain is worth it with this minimal change (7 bytes) for Valkey 8.0 and invest on the kvpair object structure with open addressing in 9.0
Yeah, I've thought about this many times. The complexity is not too bad. The technique of embedding an sds string can later be reused in other places.
Although the embedding is abstracted, it only every makes sense for sds strings. That's fine though. Dict doesn't include "sds.h" so it's decoupled.
I'm in favor. (I'll add some review comments later, just minor things.)
The reason I've been skeptical before is that I'd rather like that we invest in embedding key in robj, since that'd be beneficial in the future redesign (#169), but since this PR is already ready and the robj work is not started, I think we can merge this for Valkey 8.
@hpatro @madolson @zuiderkwast Is this change targeted for Valkey 8? If so, can we add it to Valkey 8 project board?
I was waiting on performance numbers from Hari before officially adding it.
I was waiting on performance numbers from Hari before officially adding it.
@hpatro Do you have performance numbers? Can you please add it to this issue to move forward with next steps?
Thanks for the review @PingXie and @zuiderkwast . I will shortly address them.
@madolson I've posted the benchmarking results on the top comment.
@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.
it looks like used memory decreases 10% without changing on qps.
@hpatro AFAIK most of the folks are inclined to accept it for 8, so I would ask you to follow up if you can. I'll throw it on our Monday agenda as well to make sure we close on it quickly if we can.
@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.
@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.
@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.
@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.
Thanks for the help @PingXie. Let me publish the changes which I have done tomorrow and we can go from there. Do we want to do further refactoring as a follow up PR or along with this?
Thanks @hpatro.
Do we want to do further refactoring as a follow up PR or along with this?
I am inclined to reduce as much type casting as possible in this PR and reason being that I don't trust myself doing the compiler job (of type checking the data access). I am concerned about the potential memory corruption issues. Will dedicate some time later this week for a deep-dive.
Couple of things which remain
- https://github.com/valkey-io/valkey/pull/541#discussion_r1649820995 - Abstract out metadata field to independent variable and make the struct packed, it cause issue with build - https://github.com/hpatro/valkey/actions/runs/9701100786/job/26773976709 (I'm trying out few things, let me know if anyone is aware of any solution to avoid this.)
dict.c: In function ‘dictGetNextRef’:
dict.c:930:37: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member]
930 | if (entryIsEmbedded(de)) return &decodeEmbeddedEntry(de)->next;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
- https://github.com/valkey-io/valkey/pull/541#discussion_r1649829240 - @PingXie is looking to submit a proposal.
@madolson I can't see any other builds apart from DCO check. Are we throttled/out of credits?
@madolson I can't see any other builds apart from DCO check. Are we throttled/out of credits?
You have a merge conflict..
I found some difference in how dictAddRaw is copying the key vs taking ownership of the key. I guess the failed address sanitizer runs are related to this. I added some comments with some details.
The failed address sanitizer were because of me doing a zmalloc_size on a encoded dictEntry pointer directly. Tried addressing it with this https://github.com/valkey-io/valkey/pull/541/commits/4d30109c3f8de79869fb80b0f50983f39f801ca4
Couple of things which remain
- Embed key into dict entry #541 (comment) - Abstract out metadata field to independent variable and make the struct packed, it cause issue with build - https://github.com/hpatro/valkey/actions/runs/9701100786/job/26773976709 (I'm trying out few things, let me know if anyone is aware of any solution to avoid this.)
dict.c: In function ‘dictGetNextRef’: dict.c:930:37: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member] 930 | if (entryIsEmbedded(de)) return &decodeEmbeddedEntry(de)->next; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors
- Embed key into dict entry #541 (comment) - @PingXie is looking to submit a proposal.
Yeah I did some experiments. It is actually not that big a change (and this is based on an older commit of @hpatro's PR). This really boils down to using dictEntry as an opaque structure as well and forcing access through common macros (SET/GET/GET_PTR) that perform common type translations. I think the changes are not that significant and if we could just do the first part (using dictEntry as an opaque structure) my concern will be addressed. Other refactoring can be addressed in future PRs.
The builds are passing for both 32 bit/64 bit now.
I see 4/6 people directionally inclined, so going to throw on the directionally approved tag.
@zuiderkwast / @madolson Shall we ship it ? 🚀
Yes, LGTM.
Awesome! Thanks all.
Kicked off a full-run to stress this code a bit more before merging: https://github.com/valkey-io/valkey/actions/workflows/daily.yml
This change makes sense to me overall and tradeoffs are good.
My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:
- use the internal encoding information of sds [pointer is odd]
- embed sds into some other non-native allocation buffer, potentially at arbitrary offset
I feel this has very big potential for very hard to track bugs
This change makes sense to me overall and tradeoffs are good.
My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:
use the internal encoding information of sds [pointer is odd]
embed sds into some other non-native allocation buffer, potentially at arbitrary offset
I feel this has very big potential for very hard to track bugs
Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.
It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor. I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?
This change makes sense to me overall and tradeoffs are good. My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:
- use the internal encoding information of sds [pointer is odd]
- embed sds into some other non-native allocation buffer, potentially at arbitrary offset
I feel this has very big potential for very hard to track bugs
Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.
It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor. I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?
@zvi-code Thanks for sharing your thought. Would you mind filling a separate issue about this? Given we are planning to rehaul the hashtable implementation (#169) in 9.0. It will be good to capture some of these points.
Actual test run I kicked off: https://github.com/valkey-io/valkey/actions/runs/9764931105
This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?
This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?
Yeah, I was also expecting more gain in throughput as well but saw a tiny gain. Anyway we were happy with the memory savings (without paying any cost). @judeng Do you have any recommendation to perform benchmarking to be able to observe the improvement in performance from CPU cache locality ?