dynamo-release icon indicating copy to clipboard operation
dynamo-release copied to clipboard

The UMAP generated by `reduceDimension` is rather "weak"

Open chansigit opened this issue 3 years ago • 4 comments

Hi my dear friends,

I noticed the umap generated by dyn.tl.reduceDimension is not as good as that from other tools (seurat, scanpy, etc.)

Here is a umap generated by dyn.tl.reduceDimension (preprocessed by recipe_monocle). We can see the clusters formed a condensed round mass. These cells are on a developmental trajectory. It can form a line structure according to our previous experiences. image

However, the umap generated by scanpy is better structured - the clusters correctly formed along some axises. (the same loom file as used above, the same preprocessed adata.X but fed to scanpy package for downstream umap analysis ) In this case, I used size-factor normalized and log1p transformed adata.X generated by recipe_monocle (I didn't use scanpy's preprocessing steps, I know there are normalization differences), used hvg genes to calculated pca, neighbor graphs, and umap in scanpy. Since the input are the same (recipe_monocle normalized data), why do we got quite different umap? (dynamo's umaps seem less informative)

image

Moreover, it seems that even scanpy's pca plot has a better structure preservation ability. we all know umap is non-linear after all.

image

These situations are not limited to this single dataset. All dataset I tried (up to now) hints that the UMAP produced by dynamo is rather "weak" (not as good as a non-linear method's expectation)

Looking forward to your suggestions and guidance.

p.s. I raised a lot of questions/problems recently. And hope I didn't bother your new-manuscript preparations too much.

chansigit avatar Sep 05 '22 15:09 chansigit

Hi @elfofmaxwell can you please help me improve the reduceDimension function while you are adding typing, documentation, and improving the functions. in the tools module? I am trying to wrap up a few papers these days. My feeling is that a smaller min_dist is used for scanpy. See https://umap-learn.readthedocs.io/en/latest/parameters.html.

We can discuss about this later

Xiaojieqiu avatar Sep 05 '22 16:09 Xiaojieqiu

we can also try to implement new alternatives for performing pca dimension reduction

Xiaojieqiu avatar Sep 05 '22 16:09 Xiaojieqiu

For your information, I deleted the umap embedding and recomputed umap with dyn.tl.reduceDimension(adata2,kwargs={'min_dist':0.01},enforce=True) only to find it yields exactly the same umap

image

The default min_dist=0.5 in dynamo https://github.com/aristoteleo/dynamo-release/blob/1d1f5c521d0b8763ab866dcc3c3d96be0ba3f8f3/dynamo/tools/utils_reduceDimension.py#L203,

The default min_dist of scanpy is also 0.5,see https://scanpy.readthedocs.io/en/latest/generated/scanpy.tl.umap.html

  1. it seems that min_dist is not the cause.
  2. it seems that passing a min_dist in kwargs has no impact at all.

chansigit avatar Sep 05 '22 18:09 chansigit

Ohh I think I might have found the reason- min_dist should be the solution!

I was not supposed to pass the min_dist parameter in this way dyn.tl.reduceDimension(adata2,kwargs={'min_dist':0.01},enforce=True)

It should be this direct parameter way dyn.tl.reduceDimension(adata2, min_dist=0.01,enforce=True)

after setting a smaller min_dist, the global structure appeared in the umap.

I suggest explicitly telling users in the docstring that they can adjust min_dist there.

@elfofmaxwell thank you for your attention.

It remains curious to me why the same default min_dist produce different umap compactness in dynamo and scanpy/scvelo.

chansigit avatar Sep 07 '22 02:09 chansigit

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Dec 07 '22 01:12 github-actions[bot]