Locality-sensitive-hashing-tutorial icon indicating copy to clipboard operation
Locality-sensitive-hashing-tutorial copied to clipboard

A tutorial on locality sensitive hashing, using MinHashing for document similarity and CosineSimilarity for Euclidean space similarity.

Locality Sensitive Hashing Tutorial

As the name suggests, this is a tutorial on locality sensitive hashing. All of the information is contained in the notebook.

The sampledocs folder contains some artificial data for performing the document similarity task. It consists of news articles pulled from cnn, with one document consisting of partial concatenations of the others. This is to create artificilly similar documents, which our algorithms are trying to find.

The similarity task for vectors can easily generate synthetic data by just creating random matrices, so we do that in the notebook.