valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Embed key into dict entry

Open hpatro opened this issue 1 year ago • 1 comments
trafficstars

This PR incorporates changes related to key embedding described in the https://github.com/redis/redis/issues/12216 With this change there will be no key pointer and embedded the key within the dictEntry. 1 byte is used for additional bookkeeping. Overall the saving would be 7 bytes.

Key changes:

New dict entry type introduced, which is now used as an entry for the main dictionary:

typedef struct {
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;     /* Next entry in the same hash bucket. */
    unsigned char data[];
} embeddedDictEntry;

Two new functions been added to the dictType:

size_t dictSdsKeyToBytes(unsigned char *buf, const void *key, unsigned char *header_size);
size_t dictSdsKeyLen(const void *key);

Change is opt-in per dict type, hence sets, hashes and other types that are using dictionary are not impacted. With this change main dictionary now owns the data, so copy on insert in dbAdd is no longer needed.

Benchmarking needs to be performed.

hpatro avatar May 23 '24 22:05 hpatro

Codecov Report

Attention: Patch coverage is 96.87500% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.32%. Comparing base (7719dbb) to head (825c82f). Report is 10 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #541      +/-   ##
============================================
+ Coverage     70.26%   70.32%   +0.06%     
============================================
  Files           110      111       +1     
  Lines         60108    60286     +178     
============================================
+ Hits          42234    42396     +162     
- Misses        17874    17890      +16     
Files Coverage Δ
src/db.c 88.39% <100.00%> (+0.06%) :arrow_up:
src/defrag.c 88.24% <100.00%> (+0.68%) :arrow_up:
src/dict.c 97.53% <100.00%> (+0.09%) :arrow_up:
src/kvstore.c 96.17% <100.00%> (-0.59%) :arrow_down:
src/rdb.c 75.80% <100.00%> (-0.09%) :arrow_down:
src/sds.c 86.08% <100.00%> (+0.21%) :arrow_up:
src/sds.h 78.68% <ø> (ø)
src/server.c 88.57% <100.00%> (+0.01%) :arrow_up:
src/debug.c 53.40% <0.00%> (ø)
src/object.c 78.55% <50.00%> (-0.03%) :arrow_down:

... and 21 files with indirect coverage changes

codecov[bot] avatar May 23 '24 22:05 codecov[bot]

@valkey-io/core-team Could you provide some clarity on this ?

hpatro avatar Jun 11 '24 21:06 hpatro

My thought captured under https://github.com/valkey-io/valkey/issues/394#issuecomment-2128139503

We've made changes in the past which can benefit the users in a short term and have redone the implementation in the following major/minor version. key embedding in dictEntry is a pretty small change and we can easily get rid of it with dictEntry removal. I feel the small gain is worth it with this minimal change (7 bytes) for Valkey 8.0 and invest on the kvpair object structure with open addressing in 9.0

hpatro avatar Jun 11 '24 21:06 hpatro

Yeah, I've thought about this many times. The complexity is not too bad. The technique of embedding an sds string can later be reused in other places.

Although the embedding is abstracted, it only every makes sense for sds strings. That's fine though. Dict doesn't include "sds.h" so it's decoupled.

I'm in favor. (I'll add some review comments later, just minor things.)

The reason I've been skeptical before is that I'd rather like that we invest in embedding key in robj, since that'd be beneficial in the future redesign (#169), but since this PR is already ready and the robj work is not started, I think we can merge this for Valkey 8.

zuiderkwast avatar Jun 12 '24 01:06 zuiderkwast

@hpatro @madolson @zuiderkwast Is this change targeted for Valkey 8? If so, can we add it to Valkey 8 project board?

bbarani avatar Jun 21 '24 17:06 bbarani

I was waiting on performance numbers from Hari before officially adding it.

madolson avatar Jun 21 '24 22:06 madolson

I was waiting on performance numbers from Hari before officially adding it.

@hpatro Do you have performance numbers? Can you please add it to this issue to move forward with next steps?

bbarani avatar Jun 21 '24 22:06 bbarani

Thanks for the review @PingXie and @zuiderkwast . I will shortly address them.

@madolson I've posted the benchmarking results on the top comment.

hpatro avatar Jun 24 '24 20:06 hpatro

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

hpatro avatar Jun 25 '24 18:06 hpatro

it looks like used memory decreases 10% without changing on qps.

hwware avatar Jun 25 '24 18:06 hwware

@hpatro AFAIK most of the folks are inclined to accept it for 8, so I would ask you to follow up if you can. I'll throw it on our Monday agenda as well to make sure we close on it quickly if we can.

madolson avatar Jun 25 '24 19:06 madolson

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.

PingXie avatar Jun 26 '24 05:06 PingXie

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.

Thanks for the help @PingXie. Let me publish the changes which I have done tomorrow and we can go from there. Do we want to do further refactoring as a follow up PR or along with this?

hpatro avatar Jun 26 '24 05:06 hpatro

Thanks @hpatro.

Do we want to do further refactoring as a follow up PR or along with this?

I am inclined to reduce as much type casting as possible in this PR and reason being that I don't trust myself doing the compiler job (of type checking the data access). I am concerned about the potential memory corruption issues. Will dedicate some time later this week for a deep-dive.

PingXie avatar Jun 27 '24 07:06 PingXie

Couple of things which remain

  • https://github.com/valkey-io/valkey/pull/541#discussion_r1649820995 - Abstract out metadata field to independent variable and make the struct packed, it cause issue with build - https://github.com/hpatro/valkey/actions/runs/9701100786/job/26773976709 (I'm trying out few things, let me know if anyone is aware of any solution to avoid this.)
dict.c: In function ‘dictGetNextRef’:
dict.c:930:37: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member]
  930 |     if (entryIsEmbedded(de)) return &decodeEmbeddedEntry(de)->next;
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
  • https://github.com/valkey-io/valkey/pull/541#discussion_r1649829240 - @PingXie is looking to submit a proposal.

hpatro avatar Jun 27 '24 19:06 hpatro

@madolson I can't see any other builds apart from DCO check. Are we throttled/out of credits?

hpatro avatar Jun 27 '24 21:06 hpatro

@madolson I can't see any other builds apart from DCO check. Are we throttled/out of credits?

You have a merge conflict..

madolson avatar Jun 27 '24 21:06 madolson

I found some difference in how dictAddRaw is copying the key vs taking ownership of the key. I guess the failed address sanitizer runs are related to this. I added some comments with some details.

The failed address sanitizer were because of me doing a zmalloc_size on a encoded dictEntry pointer directly. Tried addressing it with this https://github.com/valkey-io/valkey/pull/541/commits/4d30109c3f8de79869fb80b0f50983f39f801ca4

hpatro avatar Jun 27 '24 23:06 hpatro

Couple of things which remain

  • Embed key into dict entry #541 (comment) - Abstract out metadata field to independent variable and make the struct packed, it cause issue with build - https://github.com/hpatro/valkey/actions/runs/9701100786/job/26773976709 (I'm trying out few things, let me know if anyone is aware of any solution to avoid this.)
dict.c: In function ‘dictGetNextRef’:
dict.c:930:37: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member]
  930 |     if (entryIsEmbedded(de)) return &decodeEmbeddedEntry(de)->next;
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Yeah I did some experiments. It is actually not that big a change (and this is based on an older commit of @hpatro's PR). This really boils down to using dictEntry as an opaque structure as well and forcing access through common macros (SET/GET/GET_PTR) that perform common type translations. I think the changes are not that significant and if we could just do the first part (using dictEntry as an opaque structure) my concern will be addressed. Other refactoring can be addressed in future PRs.

PingXie avatar Jun 28 '24 01:06 PingXie

The builds are passing for both 32 bit/64 bit now.

hpatro avatar Jun 28 '24 05:06 hpatro

I see 4/6 people directionally inclined, so going to throw on the directionally approved tag.

madolson avatar Jun 30 '24 03:06 madolson

@zuiderkwast / @madolson Shall we ship it ? 🚀

hpatro avatar Jul 01 '24 18:07 hpatro

Yes, LGTM.

Awesome! Thanks all.

bbarani avatar Jul 02 '24 15:07 bbarani

Kicked off a full-run to stress this code a bit more before merging: https://github.com/valkey-io/valkey/actions/workflows/daily.yml

madolson avatar Jul 02 '24 16:07 madolson

This change makes sense to me overall and tradeoffs are good.

My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

  1. use the internal encoding information of sds [pointer is odd]
  2. embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

zvi-code avatar Jul 02 '24 19:07 zvi-code

This change makes sense to me overall and tradeoffs are good.

My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

  1. use the internal encoding information of sds [pointer is odd]

  2. embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.

It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor. I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?

zvi-code avatar Jul 02 '24 21:07 zvi-code

This change makes sense to me overall and tradeoffs are good. My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

  1. use the internal encoding information of sds [pointer is odd]
  2. embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.

It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor. I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?

@zvi-code Thanks for sharing your thought. Would you mind filling a separate issue about this? Given we are planning to rehaul the hashtable implementation (#169) in 9.0. It will be good to capture some of these points.

hpatro avatar Jul 02 '24 21:07 hpatro

Actual test run I kicked off: https://github.com/valkey-io/valkey/actions/runs/9764931105

madolson avatar Jul 02 '24 22:07 madolson

This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?

judeng avatar Nov 02 '24 02:11 judeng

This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?

Yeah, I was also expecting more gain in throughput as well but saw a tiny gain. Anyway we were happy with the memory savings (without paying any cost). @judeng Do you have any recommendation to perform benchmarking to be able to observe the improvement in performance from CPU cache locality ?

hpatro avatar Nov 06 '24 23:11 hpatro