cuckoo-filter icon indicating copy to clipboard operation
cuckoo-filter copied to clipboard

Enhance Support for Larger Datasets and Buckets in Encoding

Open EladGabay opened this issue 2 years ago • 5 comments

This commit improves encoding by enabling the handling of number of items and buckets exceeding max(uint32). Formerly, the encoding used uint32 for counts, but the filter structure already supported larger values using uint. Until now, the filter partially supported larger datasets, not all the buckets were utilized, note to the change in generateIndexTagHash, altIndex and indexHash.

Now, all references to bucket indices and item counts explicitly use uint64. A new encoding format accommodates larger filter. To distinguish between legacy (up to max(uint32) items) and the new format, a prefix marker is introduced.

Decoding seamlessly supports both formats. The encode method takes a legacy boolean parameter for gradual adoption.

EladGabay avatar Aug 15 '23 11:08 EladGabay

@linvon would you like to take a look? 😊

EladGabay avatar Aug 18 '23 07:08 EladGabay

@linvon would you like to take a look? 😊

Sorry, busy with work, but I will find some time to handle this

linvon avatar Aug 21 '23 04:08 linvon

Hi, @linvon , let me know if you need any help :)

EladGabay avatar Sep 06 '23 10:09 EladGabay

@linvon gentle ping

EladGabay avatar Sep 28 '23 08:09 EladGabay

Hi @linvon do you think it's going to be merged soon? 🙏

EladGabay avatar Jan 05 '24 13:01 EladGabay