icsa
icsa
@conker84 Can you point me to the source code and git logs for the ETL tool?
Given that the hash values can be reproduced deterministically, they don't ned to be stored. The hash values can be recomputed lazily - trading compute for space. On Sunday, March...
* need to be stored Apologies for the typo. On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***> wrote: Given that the hash values can be reproduced deterministically,...
The "Circulant" part means that you can use a random permutation then rotate the permutation by k (of K) items to create "new" permutations. One permutation gets used K times...
> But how do you compute the minimum value for each permutation function > without having access to the input hash values? Lazily, perhaps using a Python generator.The hash values...
Let me work on a python implementation of the C-minHash algorithm to demonstrate the concept in context. I'll get you the outline of the code in the next few days....
Consider Roaring Bitmaps to represent each [compressed] set of integers. https://github.com/RoaringBitmap