dynamo-release
dynamo-release copied to clipboard
The UMAP generated by `reduceDimension` is rather "weak"
Hi my dear friends,
I noticed the umap generated by dyn.tl.reduceDimension is not as good as that from other tools (seurat, scanpy, etc.)
Here is a umap generated by dyn.tl.reduceDimension (preprocessed by recipe_monocle). We can see the clusters formed a condensed round mass. These cells are on a developmental trajectory. It can form a line structure according to our previous experiences.

However, the umap generated by scanpy is better structured - the clusters correctly formed along some axises. (the same loom file as used above, the same preprocessed adata.X but fed to scanpy package for downstream umap analysis ) In this case, I used size-factor normalized and log1p transformed adata.X generated by recipe_monocle (I didn't use scanpy's preprocessing steps, I know there are normalization differences), used hvg genes to calculated pca, neighbor graphs, and umap in scanpy. Since the input are the same (recipe_monocle normalized data), why do we got quite different umap? (dynamo's umaps seem less informative)

Moreover, it seems that even scanpy's pca plot has a better structure preservation ability. we all know umap is non-linear after all.

These situations are not limited to this single dataset. All dataset I tried (up to now) hints that the UMAP produced by dynamo is rather "weak" (not as good as a non-linear method's expectation)
Looking forward to your suggestions and guidance.
p.s. I raised a lot of questions/problems recently. And hope I didn't bother your new-manuscript preparations too much.
Hi @elfofmaxwell can you please help me improve the reduceDimension function while you are adding typing, documentation, and improving the functions. in the tools module? I am trying to wrap up a few papers these days. My feeling is that a smaller min_dist is used for scanpy. See https://umap-learn.readthedocs.io/en/latest/parameters.html.
We can discuss about this later
we can also try to implement new alternatives for performing pca dimension reduction
For your information, I deleted the umap embedding and recomputed umap with dyn.tl.reduceDimension(adata2,kwargs={'min_dist':0.01},enforce=True)
only to find it yields exactly the same umap

The default min_dist=0.5 in dynamo https://github.com/aristoteleo/dynamo-release/blob/1d1f5c521d0b8763ab866dcc3c3d96be0ba3f8f3/dynamo/tools/utils_reduceDimension.py#L203,
The default min_dist of scanpy is also 0.5,see https://scanpy.readthedocs.io/en/latest/generated/scanpy.tl.umap.html
- it seems that min_dist is not the cause.
- it seems that passing a
min_distin kwargs has no impact at all.
Ohh I think I might have found the reason- min_dist should be the solution!
I was not supposed to pass the min_dist parameter in this way
dyn.tl.reduceDimension(adata2,kwargs={'min_dist':0.01},enforce=True)
It should be this direct parameter way dyn.tl.reduceDimension(adata2, min_dist=0.01,enforce=True)
after setting a smaller min_dist, the global structure appeared in the umap.
I suggest explicitly telling users in the docstring that they can adjust min_dist there.
@elfofmaxwell thank you for your attention.
It remains curious to me why the same default min_dist produce different umap compactness in dynamo and scanpy/scvelo.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days