gaoya icon indicating copy to clipboard operation
gaoya copied to clipboard

Locality Sensitive Hashing

Results 8 gaoya issues
Sort by recently updated
recently updated
newest added

Noticed your comment on [Hacker News](https://news.ycombinator.com/item?id=33123972) about this repo. I worked on something similar at a previous job (so I don't have the code the share), but looked pretty deeply...

Hi Team, I just read that the super-bits LHS: https://proceedings.neurips.cc/paper/2012/hash/072b030ba126b2f4b2374f342be9ed44-Abstract.html It was more accurate than SRP-LSH for angular distance between (0,pi/2). A java implementation is here: https://github.com/tdebatty/java-LSH/blob/master/src/main/java/info/debatty/java/lsh/SuperBit.java Let me know...

- pip install gaoya - only release 0.2.0 version in pypi - github code in __version__ = "0.1.3" - https://pypi.org/project/gaoya/

reference: - https://github.com/xorbitsai/xorbits/blob/main/python/xorbits/experimental/dedup.py - https://github.com/ChenghaoMou/text-dedup/blob/main/text_dedup/minhash_spark.py - https://github.com/FlagOpen/FlagData/blob/main/flagdata/deduplication/minhash.py

Hi! 👋 I'm trying to use the clustering functionality of gaoya, and I'm having trouble getting it to work. I've successfully used the basic query example from the documentation, but...

hi @serega, Just add a faster hash function xxhash3 for short strings less than 16 bytes. It is not attack resistant but much faster. Let me know you have some...

Hi @serega, for SimHash, the hasher can be any 64 or 128 bit hash function right. Just wondering why siphash was used, I understand it attack resistant but for speed...