algebird icon indicating copy to clipboard operation
algebird copied to clipboard

Implement randomized LSH according to Chazelle's paper

Open perryzheng opened this issue 9 years ago • 5 comments

http://courses.cs.washington.edu/courses/cse521/15sp/slides/LSH-CACM.pdf

See bottom of page 119 for LSH pseudocode and see section 4 for a new implementation.

My implementation might not be the same as section 4 but a variation of it.

Let me know if that's a good idea

perryzheng avatar Apr 22 '15 23:04 perryzheng

How does it compare to MinHash as we already have minhash impl? Other things to consider is whether its associative, commutative etc

MansurAshraf avatar Apr 22 '15 23:04 MansurAshraf

MinHash is for jaccard distance, this is for euclidean distance. However, it's not clear to me whether this belongs in algebird or scalding...

avibryant avatar Apr 24 '15 04:04 avibryant

Hey @avibryant , based on this paper, http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p253-datar.pdf section 3, it uses say gaussian distribution as a sketch for the high dimensional vector. 1 important question.. This could be expressed in terms of monoids and thus could be part of algebird, right? I'm honestly not very clear

perryzheng avatar Apr 28 '15 07:04 perryzheng

Gaussian distributions form a monoid, yes. (And actually I would love to see Gaussian added to Algebird, just in itself).

avibryant avatar Apr 28 '15 16:04 avibryant

Submitted a pull request at https://github.com/twitter/algebird/pull/449

perryzheng avatar Jun 09 '15 01:06 perryzheng