lsh-rs
lsh-rs copied to clipboard
Can't obtain results using Rust implementation
I'm roughly using the following code:
let query_emb: Vec<f32>;
let doc_emb: Vec<Vec<f32>>; // contains 3 document embeddings
...
let mut lsh = LshMem::new(10, 30, 512).srp().unwrap();
let _x = lsh.store_vecs(&doc_emb[..]);
let result = lsh.query_bucket(&query_emb).unwrap();
println!("lsh-rs: {:?}", result);
Unfortunately, the result is empty. I'm testing the same query and documents with ngt-rs and I get some results (I'm looking for an alternative to ngt-rs which runs on windows). Is this a problem of using better parameters?
It seems like it, messing with n_projections and n_hash_tables make it sometimes return results. Do you know of effective heuristics for choosing values for the two? I plan on working with 100-10000 candidate vectors of dimension 512, but was just testing with 3 of them.
Here is a presentation I have on the subject: LSH.pdf
And a notebook with some theory notebook
Most important is understanding the gap amplification. The latest plot in the notebook. You can choose K and L and thereby tuning the collision probability for a certain similarity value.
P.S. you can play around with the python version of this crate in the notebook:
https://pypi.org/project/floky/