qdrant-web-ui icon indicating copy to clipboard operation
qdrant-web-ui copied to clipboard

tSNE vector compression optimised using wasm-bhtsne

Open kartik-gupta-ij opened this issue 2 years ago • 4 comments

#108

kartik-gupta-ij avatar Sep 08 '23 17:09 kartik-gupta-ij

The computation with wasm-bhtsne seems to be very fast and tsne appears to be a solid choice! However, did you consider any other algorithms or resources why (bh)tSNE might be superior to e.g. UMAP or PCA?

Some resources:

I'm just curious which algorithm might perform best overall.

do-me avatar Jan 11 '24 10:01 do-me

Hi @do-me , it looks like the main bottleneck is not the algorithm, but the implementation and ability to use multiple threads for computation. If there is a solid implementation of any of those, we are happy to make it configurable

generall avatar Jan 11 '24 14:01 generall

@kartik-gupta-ij if I saw correctly, the implementation of wasm-bhtsne has all parameters hard coded apart from the iterations number. E.g. due to the perplexity default it crashes with less than 60 vectors (wrote a dirty fix here adding random vectors for the calculation to make it work in either case). Also, depending on the matrix size other defaults might be better suited. 

Is anyone interested in forking and adding the lacking params to the wasm version? Unfortunately I'm not experienced in Rust. 

Apart from the lack of parameteres it's fairly performant. I tested e.g. with 1566 vectors of size 384, 1000 iterations, takes 20s on my i7 laptop.

do-me avatar Jan 29 '24 17:01 do-me

@generall This demonstrates a method to do dimensionality reduction based on a HNSW index. It's very fast compared to tSNE. If Qdrant provided a way to produce such results to the frontend based on the collections HNSW index then (in my testing) visualizing even tens of thousands of nodes would take mere seconds.

https://github.com/jean-pierreBoth/annembed/blob/master/src/embedder.rs

functorism avatar Sep 04 '24 10:09 functorism