java-LSH icon indicating copy to clipboard operation
java-LSH copied to clipboard

A Java implementation of Locality Sensitive Hashing (LSH)

Results 11 java-LSH issues
Sort by recently updated
recently updated
newest added

Hi, I was wondering how to use this library for comparing two different Strings that are tokenized into a string vector each. The examples only show boolean vectors which are...

Bumps [junit](https://github.com/junit-team/junit4) from 4.10 to 4.13.1. Release notes Sourced from junit's releases. JUnit 4.13.1 Please refer to the release notes for details. JUnit 4.13 Please refer to the release notes...

dependencies

The current implementation uses a `boolean[]` as an input. Use of a BitSet (https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html) would be a lot more efficient. For example, if dictionary size is `Integer.MAX_INT`, as it would...

I see that the algorithm is based on the MMDS book by Ullman et al. However, your implementation seems to use a fixed THRESHOLD value of 0.5, whereas in the...

According to the description in this file, signature size equals to R * b instead of R * s

The hash signature method of LSH class is order independent. But according to Mining of Massive Datasets the bands should be identical. In the current implementation assume two bands of...

Allow the hash function in LSHMinHash to take a set as input.

In the comments of the example LSHMinHash code, it says that 'to get relevant results, the number of elements per bucket should be at least 100'. Why? I tried to...

Hi, I don't know how to return the topk similar vectors. Should I use signatures to calculate similarity? Or to use hashvalue of signatures to calculate? Thanks very much,

I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of...