kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Add support of the Hyperloglog data structure

Open tutububug opened this issue 1 year ago • 33 comments
trafficstars

storage format description: https://github.com/apache/kvrocks-website/pull/207/files.

tutububug avatar Mar 07 '24 09:03 tutububug

@tutububug Thank you for your contribution. Running ./x.py format should eliminate the linting issues.

torwig avatar Mar 07 '24 11:03 torwig

Thank you for your contribution! Maybe you need to add a command parser to use the data structure with actual command like redis.

Yangsx-1 avatar Mar 07 '24 12:03 Yangsx-1

Thank you for your contribution!

Could you include your design in the PR description? For example, explain how to encode the metadata and HLL data (subkeys), similar to what is shown on https://kvrocks.apache.org/community/data-structure-on-rocksdb.

PragmaTwice avatar Mar 07 '24 13:03 PragmaTwice

Thank you for your contribution! Maybe you need to add a command parser to use the data structure with actual command like redis.

Yes, I will give the commit later.

Thank you for your contribution!

Could you include your design in the PR description? For example, explain how to encode the metadata and HLL data (subkeys), similar to what is shown on https://kvrocks.apache.org/community/data-structure-on-rocksdb.

OK.

tutububug avatar Mar 07 '24 15:03 tutububug

@PragmaTwice I create a PR(https://github.com/apache/kvrocks-website/pull/207) for describe hyperloglog storage format.

tutububug avatar Mar 08 '24 10:03 tutububug

@PragmaTwice I create a PR(apache/kvrocks-website#207) for describe hyperloglog storage format.

Thank you! Currently, it is only necessary to include your design in the PR description, not on the website as the design is not finalized yet.

Regarding your design, I have some questions:

  1. Based on the current design, typically one Redis key will introduce a maximum of 16384 RocksDB keys (registers). Each value corresponding to a RocksDB key contains only one integer. This may be inefficient; merging multiple registers onto one key could reduce the number of keys introduced. WDYT?
  2. I noticed that integers are stored as string representations (std::to_string) rather than their binary form (e.g., subkey and register value). What is the reason for this approach?
  3. Having a constant size seems illogical since the number of subkeys linked to this Redis key varies. (However, if every write operation modifies the metadata, it may lead to a decrease in performance. I don't have a clear idea about this aspect.)

Concerning the code, although I haven't reviewed it thoroughly yet, there are some points worth mentioning:

  1. The complete source code of MurmurHash can be placed in the vendor directory.
  2. It appears that using PFADD in the code leads to an increasing number of RocksDB keys (registers). There seems to be no operation that reduces these keys until deleting this Redis key. How can we prevent an increase in RocksDB keys without a mechanism to decrease them?

cc @git-hulk @mapleFU

PragmaTwice avatar Mar 09 '24 06:03 PragmaTwice

@tutububug As @PragmaTwice mentioned in https://github.com/apache/kvrocks/pull/2142#issuecomment-1986756740, it's unnecessary to use a static number of 16384 since it may heavily affect the read performance while using PFMERGE, I guess a smaller one like 16 is enough and every subkey has 1000 integers.

git-hulk avatar Mar 09 '24 07:03 git-hulk

I suggest that we can store the number of registers in one rocksdb key in the metadata (for example, referred to as register_number_in_one_key), so that even if adjustments are made later for performance reasons, the compatibility of kvrocks data can be maintained.

Currently, a fixed value for register_number_in_one_key can be hardcoded in the code.

cc @tutububug

PragmaTwice avatar Mar 09 '24 07:03 PragmaTwice

Based on the current design, typically one Redis key will introduce a maximum of 16384 RocksDB keys (registers). Each value corresponding to a RocksDB key contains only one integer. This may be inefficient; merging multiple registers onto one key could reduce the number of keys introduced. WDYT?

There're two parts of things.

  1. Bitmap type can be used to do the dense logic, which would reducing the keycount
  2. sparse hll type can be introduced to save some overhead here.

mapleFU avatar Mar 09 '24 08:03 mapleFU

Based on the current design, typically one Redis key will introduce a maximum of 16384 RocksDB keys (registers). Each value corresponding to a RocksDB key contains only one integer. This may be inefficient; merging multiple registers onto one key could reduce the number of keys introduced. WDYT?

There're two parts of things.

  1. Bitmap type can be used to do the dense logic, which would reducing the keycount
  2. sparse hll type can be introduced to save some overhead here.

So I think the question is, should we also introduce two mode of hll encoding (sparse and dense layout) and an auto switching policy between these two layout?

PragmaTwice avatar Mar 09 '24 08:03 PragmaTwice

So I think the question is, should we also introduce two mode of hll encoding (sparse and dense layout) and an auto switching policy between these two layout?

I prefer to do this :-) But we can regard it as a further optimization

mapleFU avatar Mar 09 '24 08:03 mapleFU

Other Looks ok to me

mapleFU avatar Mar 09 '24 12:03 mapleFU

Regarding your design, I have some questions:

1. Based on the current design, typically one Redis key will introduce a maximum of 16384 RocksDB keys (registers). Each value corresponding to a RocksDB key contains only one integer. This may be inefficient; merging multiple registers onto one key could reduce the number of keys introduced. WDYT?

Not really. The register(subkey) only be stored which its count is not zero. This point is different from the memory implementation with static array as dense encode. On disk, I think its sparse encode naturally.

2. I noticed that integers are stored as string representations (`std::to_string`) rather than their binary form (e.g., subkey and register value). What is the reason for this approach?

The number of consecutive 0s is calculated from the last 50 digits of the hash value, so the maximum value is 50, and the maximum value stored in a string is 2 bytes. It should not waste a lot of space, and at the same time save the overhead of integer encoding and decoding. For keys, it may be more efficient, but the largest index is only 5 bytes (16383).

3. Having a constant `size` seems illogical since the number of subkeys linked to this Redis key varies. (However, if every write operation modifies the metadata, it may lead to a decrease in performance. I don't have a clear idea about this aspect.)

Currently, the ‘size’ variable has no practical purpose; the only requirement is that it be non-zero. Due to the implementation of kvrocks getmetadata, non-string type data structures with a size of 0 are judged to be expired, and the HLL add parameter that redis has implemented allows no parameters but the key will be stored. For compatibility, size is used as a constant to prevent expiration.

Concerning the code, although I haven't reviewed it thoroughly yet, there are some points worth mentioning:

1. The complete source code of MurmurHash can be placed in the `vendor` directory.

OK.

2. It appears that using `PFADD` in the code leads to an increasing number of RocksDB keys (registers). There seems to be no operation that reduces these keys until deleting this Redis key. How can we prevent an increase in RocksDB keys without a mechanism to decrease them?

For an HLL user key, the maximum register value is 16384, and it cannot be larger. In fact, I think this should be considered controllable compared to data structures such as hash, set, list, etc. whose size is determined by user input.

@PragmaTwice cc @git-hulk @mapleFU

tutububug avatar Mar 09 '24 17:03 tutububug

The number of consecutive 0s is calculated from the last 50 digits of the hash value, so the maximum value is 50, and the maximum value stored in a string is 2 bytes. It should not waste a lot of space, and at the same time save the overhead of integer encoding and decoding. For keys, it may be more efficient, but the largest index is only 5 bytes (16383).

For rocksdb value, it's should be 1-2 bytes payload, the value size is also included. So, it introduce an extremly huge overhead. So I prefer the impl of bitmap/bitfield.

Get 2^14 times in rocksdb would also be heavy, and might break some statistics in rocksdb, which making it tent to compaction more or caching the unneccessary blocks.

mapleFU avatar Mar 09 '24 17:03 mapleFU

The number of consecutive 0s is calculated from the last 50 digits of the hash value, so the maximum value is 50, and the maximum value stored in a string is 2 bytes. It should not waste a lot of space, and at the same time save the overhead of integer encoding and decoding. For keys, it may be more efficient, but the largest index is only 5 bytes (16383).

Sorry but I cannot get your point. In every sense, using a string representation cannot be more efficient. In binary representation: if the maximum value is less than 2^8, you should use one byte; and if the maximum value is less than 2^16, you should use two bytes.

Not really. The register(subkey) only be stored which its count is not zero. This point is different from the memory implementation with static array as dense encode. On disk, I think its sparse encode naturally.

I don't think you answered my question. The maximum number of subkeys is still 2^16. I think multiple registers should be put into one subkey to reduce the number of keys and improve the efficiency of count.

PragmaTwice avatar Mar 10 '24 01:03 PragmaTwice

@mapleFU @PragmaTwice Thank you for carefully code review. Since the code for the algorithm part comes from redis, some function arguments and comments need to be adjusted. Regarding optimizing key encoding and the number of subkeys, I will refer to your suggestions. cc @git-hulk

tutububug avatar Mar 11 '24 03:03 tutububug

@tutububug Thanks for your great efforts.

git-hulk avatar Mar 11 '24 04:03 git-hulk

@tutububug FYI:

  1. bitfield: https://github.com/apache/kvrocks/blob/unstable/src/common/bitfield_util.h#L33 . You can refer to these logic or rewrite the HLL logic using these yourself. A value in HLL using redis' impl would be about 13KB, we can storing it in one value. You can also refer to my bit utils: https://github.com/apache/kvrocks/blob/unstable/src/common/bit_util.h . It's not hard but maybe need some time to get familiar with it. Redis uses the similiar logic, but maybe not encapsulate them as tools.
  2. Coding style: the redis using macro and some raw-pointers, for maintaining, we'd better using C++ style code
  3. Protocol and metadata: currently, this patch only implements "dense" HLL. I'm ok with this. But maybe we can leave a field for "format" in metadata. For example
| ... | format-version (1Byte)|

And set "dense == 0" here.

Thank you for your patient and your code

mapleFU avatar Mar 11 '24 06:03 mapleFU

A value in HLL using redis' impl would be about 13KB, we can storing it in one value.

Do you mean that we can store the whole HLL data in one rocksdb key?

IMHO I think it is worth exploring how many rocksdb key value pairs is suitable to be stored here.

Kvrocks does not have MVCC. Considering that write ops are very frequent for HLL, I think putting them into multiple keys may improve efficiency.

cc @mapleFU

PragmaTwice avatar Mar 11 '24 08:03 PragmaTwice

@PragmaTwice We can first implement that, and optimize writing to HLL step-by-step. Since 2.9 release would be a long time from now.

mapleFU avatar Mar 11 '24 14:03 mapleFU

@mapleFU @PragmaTwice The code refactoring is ready for review, and the storage format description is sorted out at https://github.com/apache/kvrocks-website/pull/207/files. Please take a look if have time. cc @git-hulk @torwig

tutububug avatar Apr 10 '24 11:04 tutububug

Sure, @tutububug can help to fix the lint error while you're free: https://github.com/apache/kvrocks/actions/runs/8538011240/job/23657605010?pr=2142#step:7:1368

git-hulk avatar Apr 10 '24 12:04 git-hulk

Sure, @tutububug can help to fix the lint error while you're free: https://github.com/apache/kvrocks/actions/runs/8538011240/job/23657605010?pr=2142#step:7:1368

Strange, there is no such warning in my branch(https://github.com/tutububug/incubator-kvrocks/pull/6). I triggered the action again to check.

tutububug avatar Apr 11 '24 11:04 tutububug

@tutububug you can pull the unstable into this branch and have a try.

git-hulk avatar Apr 11 '24 14:04 git-hulk

@tutububug you can pull the unstable into this branch and have a try. fixed

tutububug avatar Apr 12 '24 06:04 tutububug

LGTM, one comment to check if we need to add a notice for the source code.

git-hulk avatar Apr 12 '24 13:04 git-hulk

@mapleFU Could you review it again?

PragmaTwice avatar Apr 12 '24 14:04 PragmaTwice

@mapleFU Can help to take a look while you get time. We can merge this PR if it looks good since it's been pending too long.

git-hulk avatar Apr 15 '24 02:04 git-hulk

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Apr 20 '24 08:04 sonarqubecloud[bot]

This looks much better than previous version. The remaining are mostly code-style problem

fixed.

tutububug avatar Apr 22 '24 14:04 tutububug