java-LSH icon indicating copy to clipboard operation
java-LSH copied to clipboard

A dubt

Open ogreyesp opened this issue 8 years ago • 1 comments

Hello,

I'm using the java-LSH code, I consider that it is a great project.

LSH is a technique for handling high-dimensional datasets, for instance datasets that have 100000 features, or even more...

When I run the examples SuperBitExample, SuperBitSparseExample or LSHSuperBitExample, I note that they run OK. However, if I increase the number of dimension, for instance I put the number of dimension to 1000, then the speed of the program is very very slow.

Can I use this project for working with datasets that have high-dimensionality.?

Best regards,

Oscar

ogreyesp avatar Mar 29 '16 07:03 ogreyesp

Hi,

Sorry for the late answer...

LSH is able to work with high-dimensional datasets but (for signature size S and D dimenstions):

  • computing a single signature has a computation cost O(S.D), so this is slow
  • SuperBit has to make the reference vectors orthogonal, which is also slow

At the other side, computing the similarity between signatures is very fast, which makes LSH suitable for large datasets, even with high dimensions...

I might add a computation time analysis one of these days to make this clear...

Best regards,

tdebatty avatar Apr 06 '16 19:04 tdebatty