tsne-cuda icon indicating copy to clipboard operation
tsne-cuda copied to clipboard

Feature Request: Custom distance matrix input

Open RichieHakim opened this issue 3 years ago • 3 comments

FEATURE REQUEST:

In https://github.com/CannyLab/tsne-cuda/issues/8, the possibility of using a custom NN matrix is discussed and noted to be 'easy' to implement. DavidMChan: " It would be easy to add the ability to pass in a sparse nearest neighbors matrix, however it becomes more complicated if you want to extract the nearest neighbors from a pre-computed distance matrix."

It would be a significant improvement that would open up a lot of use cases if this were implemented. Specifically: allowing a user to input a custom distance matrix (ie a sparse knn_graph) would be amazing. It would be sufficient for users already familiar with and using this feature in sklearn's TSNE to directly port their workflow to tsne-cuda.

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html metricstr or callable, default=’euclidean’: ...If metric is “precomputed”, X is assumed to be a distance matrix. ...

Thanks!

RichieHakim avatar Jan 16 '22 19:01 RichieHakim

I'll look into adding this (though, TBH, I can't promise anything), but I'm also happy to accept a PR to address this.

For future reference (and for anyone who wants to give it a shot), the idea would be to shortcut the logic for nearest neighbors here: https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/fit_tsne.cu#L118

It's not that hard to do, since the rest of the TSNE algorithm only requires a float distance array of size (N x # neighbors) and a similarly shaped array of the nearest neighbor indices.

The logic for passing arrays is already in place (since we handle pre-initialized T-SNE (see how preinit_data) is handled in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/python/tsnecuda/TSNE.py), and how it's parsed into the actual function call in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/ext/pymodule_ext.cu

All that would have to be done is to create a new option in the options file (just like the pre-init data), https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/include/options.h, and reference it during the main tsne call.

DavidMChan avatar Jan 18 '22 20:01 DavidMChan

This is still dearly hoped for.

RichieHakim avatar Jul 04 '22 20:07 RichieHakim

I have the same requests here.

loganylchen avatar Dec 19 '23 04:12 loganylchen