valkey [NEW]Info Keysizes section

The problem/use-case that the feature addresses

In Valkey, for any key, we have 2 different concepts to identify it: bigkey or memkey (aka large key). (You can check the valkey-cli options https://valkey.io/topics/cli/)

bigkey: it represents how many elements with this key, includes List, Set, Hash, and Sorted Set, or how long with the String datatype. memkeys (large key): it represents the number of bytes that a key and its value require to be stored in RAM.

For example:

127.0.0.1:7000> lpush mylist "Hello" "World" "and" "Hello" "Valkey"
(integer) 5

bigkey for mylist: （it returns 5）

127.0.01:7000> llen mylist
(integer) 5

memkey for mylist: (it returns 57 in Valkey 8.1 version)

127.0.01:7000> MEMORY USAGE mylist
(integer) 57

Now in Valkey, we do not directly calculate the total memory consumption for all the keys in memory in real time, we use a reverse calculation way: used_memory - overhead_memory. Because it is too expensive to calculate the memory usage of every key directly. So it is same, implementing memkey(large keys) in real time is challenging, thus someone use the big keys as an alternative as an approximate method to achieve the large key goal.

Description of the feature

Reference Redis latest feature in Version 8 as below:

127.0.0.1:6379> INFO keysizes

Keysizes
db0_distrib_strings_sizes:1=19,2=655,4=3918,8=19,16=23,32=326,64=5,128=1,256=47,512=100899,1K=31,2K=29,4K=23,8K=16,16K=3,32K=2
db0_distrib_lists_items:1=5784492,2=151535,4=46670,8=20453,16=8637,32=3558,64=1047,128=676,256=533,512=218,4K=1,8K=42
db0_distrib_sets_items:1=7355615,2=207181,4=50612,8=21462,16=8864,32=2905,64=1365,128=974,256=773,512=675,1K=395,2K=292,4K=154,8K=89,16K=48,32K=21,64K=27,128K=16,256K=5,512K=4,1M=2
db0_distrib_hashes_items:2=1,4=544,8=746485,16=324174,32=141169,64=207329,128=4349,256=136226,1K=1

The POC PR is here: https://github.com/valkey-io/valkey/pull/1967 I already implement part of this feature for several commands, such as LPUSH, HSET etc.

Ref: We have another feature for BigKeyLog https://github.com/valkey-io/valkey/issues/1827

Alternatives you've considered

Any alternative solutions or features you've considered, including references to existing open and closed feature requests in this repository.

Additional information

Any additional information that is relevant to the feature request.

Apr 17 '25 18:04 hwware

@hwware, with the upcoming atomic slot migration and slot level metrics (cpu/memory/network), the operator can do their work without having key-level metrics anymore IMO. this seems to be more geared towards developers as oppose to operators? can you talk about the developer use cases a bit?

Apr 28 '25 03:04 PingXie

Info keysize show the distribution range of keysizes. I think it can be used to confirm whether the current valkey has large key.

Apr 29 '25 03:04 wuranxx

@hwware, with the upcoming atomic slot migration and slot level metrics (cpu/memory/network), the operator can do their work without having key-level metrics anymore IMO. this seems to be more geared towards developers as oppose to operators? can you talk about the developer use cases a bit?

Do you point this issue https://github.com/valkey-io/valkey/issues/852. I think slot level metrics can not align with the key level issue due to the following reasons:

In the standalone mode, there is no slot concept, then slot level metrics won't work
the key level metrics will reflect the key distribution for every datatype, I do not remember the slot level metrics including this kind of requirement. (If i am wrong, pls correct me)

Apr 30 '25 18:04 hwware

@hwware, with the upcoming atomic slot migration and slot level metrics (cpu/memory/network), the operator can do their work without having key-level metrics anymore IMO. this seems to be more geared towards developers as oppose to operators? can you talk about the developer use cases a bit?

This https://github.com/valkey-io/valkey/issues/1803 is much closer to the info keysizes feature, but it provides another prospective to admin

May 01 '25 14:05 hwware

Discussed during the core team meeting:

Consensus was that having an info field is the better suggestion as compared to a dedicated command such as https://github.com/valkey-io/valkey/pull/1151.
The implementation needs to be cleaned up a little bit, we can use the location of the first bit instead of looping over the exponential buckets.
We want to evaluate if the implementation can be naturally extended to include memory in the future. If it can, then we can support both with a single configuration option without impacting performance.
We need to evaluate the performance and decide if this needs to be behind a configuration flag.
@PingXie will talk to wen offline about use cases.

May 05 '25 15:05 madolson