scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

UMAP transform

Open Hrovatin opened this issue 2 years ago • 9 comments

  • [ x] Additional function parameters / changed functionality / changed defaults?
  • [ ] New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
  • [ ] New plotting function: A kind of plot you would like to seein sc.pl?
  • [ ] External tools: Do you know an existing package that should go into sc.external.*?
  • [ ] Other?

Is there an option to use umap transform (https://umap-learn.readthedocs.io/en/latest/transform.html) with scanpy? E.g. for embedding new data into the same UMAP space.

Hrovatin avatar May 22 '22 10:05 Hrovatin

IMHO you can only do it by fitting a UMAP object separately. The current implementation does only return the embedding given a certain knn graph (also built with UMAP).

dawe avatar May 22 '22 11:05 dawe

Yes, then this could be extended in scanpy. I imagine this would be very useful for reference mapping visualisations.

Hrovatin avatar May 22 '22 13:05 Hrovatin

Indeed, but then I believe UMAP should be derived from gene space and not from PCA. Even if the variance could be decomposed on the same components, the loadings could have opposite sign and UMAP would interpret them as totally different samples

dawe avatar May 22 '22 14:05 dawe

The UMAP (actually neighbours in the current implementation) is already now derived from any embedding the user wants, including integration embedding, so this is not an issue in itself.

Hrovatin avatar May 22 '22 15:05 Hrovatin

Yep, but still you need data passed to not only have the same dimensionality, you need dimensions to have the same meaning any time you want to project new data. If you have integrated embeddings (such ash X_pca_harmony) those will change every time you add new data. Using genes to fit a initial UMAP will ensure that you can transform new data, provided you have the same genes

dawe avatar May 22 '22 16:05 dawe

If you have integrated embeddings (such ash X_pca_harmony) those will change every time you add new data.

This isn't always true though, e.g., if you use scArches or seurat (which also seems to use this umap transform).

On the other hand, the umap transform visualization can be quite deceiving. It can be the case that it qualitatively appears to have no batch effects even when there definitely has been no integration/correction.

adamgayoso avatar May 23 '22 06:05 adamgayoso

Right, but don't such methods require the same genes to be used so that a trained model (e.g. a VAE) can be applied? Once the model generates the new embedding, those can surely be transformed using the same UMAP. UMAP.transform() can only be used (as all fitted models) if the feature set of new data matches the ones used for UMAP.fit().

dawe avatar May 23 '22 08:05 dawe

Yes, this would be the case of scArches where query embedding dimensions match reference and reference is not changed upon query mapping.

Hrovatin avatar May 26 '22 07:05 Hrovatin

I think this may be already implemented in https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.ingest.html, however, this function contains extra integration and label transfer steps that are not needed for all applications. It would be great if this could be disentangled to make the umap transform available as a separate function on scanpy umaps. Also, it seems that this function does not use scanpy umap to calculate umap so changes may be needed in how scanpy umap is currently calculated.

Hrovatin avatar Sep 05 '22 17:09 Hrovatin