qdrant-web-ui tSNE vector compression optimised using wasm-bhtsne

tSNE vector compression optimised using wasm-bhtsne

Open kartik-gupta-ij opened this issue 2 years ago • 4 comments

#108

Sep 08 '23 17:09 kartik-gupta-ij

The computation with wasm-bhtsne seems to be very fast and tsne appears to be a solid choice! However, did you consider any other algorithms or resources why (bh)tSNE might be superior to e.g. UMAP or PCA?

Some resources:

Nature Article 2022 comparing algorithms
Scikit Learn Examples, mainly clustering 2D points but some algorithms work for high-dimensional data too
Tensorflow Embedding Projector using tSNE, UMAP and PCA
Reddit Discussion

I'm just curious which algorithm might perform best overall.

Jan 11 '24 10:01 do-me

Hi @do-me , it looks like the main bottleneck is not the algorithm, but the implementation and ability to use multiple threads for computation. If there is a solid implementation of any of those, we are happy to make it configurable

Jan 11 '24 14:01 generall

@kartik-gupta-ij if I saw correctly, the implementation of wasm-bhtsne has all parameters hard coded apart from the iterations number. E.g. due to the perplexity default it crashes with less than 60 vectors (wrote a dirty fix here adding random vectors for the calculation to make it work in either case). Also, depending on the matrix size other defaults might be better suited.

Is anyone interested in forking and adding the lacking params to the wasm version? Unfortunately I'm not experienced in Rust.

Apart from the lack of parameteres it's fairly performant. I tested e.g. with 1566 vectors of size 384, 1000 iterations, takes 20s on my i7 laptop.

Jan 29 '24 17:01 do-me

@generall This demonstrates a method to do dimensionality reduction based on a HNSW index. It's very fast compared to tSNE. If Qdrant provided a way to produce such results to the frontend based on the collections HNSW index then (in my testing) visualizing even tens of thousands of nodes would take mere seconds.

https://github.com/jean-pierreBoth/annembed/blob/master/src/embedder.rs

Sep 04 '24 10:09 functorism

qdrant-web-ui qdrant-web-ui copied to clipboard

tSNE vector compression optimised using wasm-bhtsne

qdrant-web-ui
qdrant-web-ui copied to clipboard