tapkee icon indicating copy to clipboard operation
tapkee copied to clipboard

bh-SNE with custom distance callback

Open ypnos opened this issue 6 years ago • 3 comments

Using method=tDistributedStochasticNeighborEmbedding in combination with withDistance() is not supported.

Laurens van der Maaten says for using a custom metric, the Vantage-Point Tree needs to be changed (see here. Note that this only refers to the Barnes-Hut algorithm; exact algorithm uses no VPTree and has it's own custom distance computation in tsne.hpp.

Interestingly, tapkee already comes with an alternative VPTree implementation that supports the use of a distance callback. It also looks quite compatible.

Could the method be altered to use the functionality of neighbors/vptree.hpp and enable withDistance()?

ypnos avatar Jun 12 '18 14:06 ypnos

We would need a search method in VantagePointTree that also returns the distances, e.g.:

std::vector<std::pair<IndexType, double>> search(const RandomAccessIterator& target, int k)

And then in the method basically only replace one line:

results.push_back({items[heap.top().index]-begin, heap.top().distance});

ypnos avatar Jun 12 '18 15:06 ypnos

That looks promising, thanks for your suggestions!

I do not have good understanding what would happen if we use non-euclidean distance. Have to check.

lisitsyn avatar Jun 13 '18 12:06 lisitsyn

I think for some data with special characteristics it could be beneficial to try L1 or EMD. But I need to see myself.

ypnos avatar Jun 13 '18 13:06 ypnos