Adding precomputed distances for Parametric UMAP
I'm trying to figure out how we could enable precomputed distances for Parametric UMAP, since fit_transform only takes in X as either the distance matrix, or the data, but Parametric UMAP would need both the distance matrix and the data as input.
One option that wouldn't require modification to anything but parametric_umap.py would be to add a fit_transform method that takes in precomputed distances, and grabs the data as self._X:
def fit_transform(self, X, y=None, precomputed_distances=None):
if self.metric == "precomputed":
if precomputed_distances is None:
raise ValueError(
"Precomputed distances must be supplied if metric \
is precomputed."
)
# prepare X for traning the network
self._X = X
# geneate the graph on precomputed distances
return super().fit_transform(precomputed_distances, y)
else:
return super().fit_transform(X, y)
then, in _fit_embed_data, grab back X.
def _fit_embed_data(self, X, n_epochs, init, random_state):
if self.metric == "precomputed":
X = self._X
...
Does that seem reasonable? I can make a PR if so.
Yes -- this is definitely a quirk of the default scikit-learn API. I actually like your workaround here. It means that ParametricUMAP will diverge in behaviour from standard UMAP, but I think that is entirely unavoidable in this case. Please go ahead wit the PR.