topometry icon indicating copy to clipboard operation
topometry copied to clipboard

comparing classic umap and topometry

Open Marwansha opened this issue 6 months ago • 2 comments

sorry if my question is naive but its for better understanding

1-in a classical workflow, if i use Topometry, is the eigncomponents here considered the dimensionality reduction method?

2- for my adata.x object I observed an eigengap around 120, so does the projection used in the model use this number of eigenvectors to do the projection?

3-also for comparison of results shouldn't i use (120 pca) equivalent to 120 EV and use those to compute neighbors and plot a umap to compare there results with tg.ProjectionDict['MAP of bw_adaptive from msDM with bw_adaptive']

4- a cell type having a higher i.d. estimates than other, how to interept this and could this be just an effect of the cell proportion being small ( low number of cells of thos celltype should lead to high I.d right? AND shold mean they doesnt cluster very well together?)

finally i did this comparision using same number of pca to generate knn and then the umap with scanpy and another time with topoMAP

here is both of projection for my dataset of pbmc that contain different individuals with different condition (non stimualted and stimualted with covid) how could u interpret the different visualization specifically in the b cell cluster in pink

tg = tp.TopOGraph(n_eigs=119, n_jobs=-1, verbosity=0)

tg.run_models(adata.X, kernels=['bw_adaptive'],
                   eigenmap_methods=['msDM'],
                   projections=['MAP','UMAP'])
image image

Marwansha avatar Dec 07 '23 10:12 Marwansha

Hi @Marwansha . Not naive at all; this is indeed quite complex. Let's go over your questions one by one. I took the liberty of rewriting them a bit (you seem to be in a hurry) so that this is also useful to others with similar questions.

1-In a classical workflow, if I use Topometry, are the eigencomponents here considered the dimensionality reduction method?

Yes, the eigencomponents are the 'latent space' (a.k.a. the dimensionality reduced spaced), similar to the latent space learned by autoencoders like scVI or the principal components learned by PCA.

TopOMetry comes with three main flavours: multiscale Diffusion Maps (msDM, favours continuous structure, useful for developmental systems), regular Diffusion Maps (DM, favours discrete structure), and Laplacian eigenmaps (LE, similar to Diffusion Maps but with the uniform sampling assumption).

I would recommend going with DM if you are studying immune cells (which appears to be the case).

Keep in mind that TopOMetry assumes the data has been scaled (i.e. sc.pp.scale(adata, max_value=10) as per Scanpy's tutorial) - otherwise, the genes with the highest expression might dominate the embedding. By looking at it, I would say your data hasn't been scaled beforehand.

2 - for my adata.x object I observed an eigengap around 120, so does the projection used in the model use this number of eigenvectors to do the projection?

Yes, the TopOGraph object will automatically use only the number of components detected with the eigengap. I think your code might be somewhat different, as 150 were plotted (so 150 were computed). It is recommended to check the histogram of intrinsic dimensionalities before setting a number of components to be computed (also using the scaled adata.X).

3-also for comparison of results shouldn't I use 120 PCs (equivalent to 120 EVs) and use those to compute neighbours and plot a UMAP to compare these results with tg.ProjectionDict['MAP of bw_adaptive from msDM with bw_adaptive']

Yes, that would make Remember that PCA and spectral methods are not equivalent - how much covariance is explained by 120 PCs in your data?

4- a cell type having higher i.d. estimates than others, how to interpret this? Could this be just an effect of the cell proportion being small ( low number of cells of those celltype should lead to high I.d right? AND should mean they doesnt cluster very well together?)

Now, this is a tough question. As of now, no one knows for sure.

In our paper, using images of handwritten digits, we saw that digits with high variability had the highest i.d. estimates, and digits with low variability had the lowest i.d. estimates. Two other works observed a similar phenomenon: Bastien Rick's work on TARDIS, and In Uzu Lim's work HADES, both focusing on singularity detection and i.d. estimation.

Overall, this phenomenon appears to be related to higher intra-cluster variability. Whether that's because of noise, low-quality cells, or actual biological signals is still unclear. I'm not aware of much work done on this.

I hope this was informative - let me know if you have more questions :)

davisidarta avatar Dec 07 '23 12:12 davisidarta

Thanks a lot that is very helpful and inspiring.

for 1- you are right i haven't scaled the genes for 2- i am was trying to understand if i compute 150 components , (tg = tp.TopOGraph(n_eigs=150, n_jobs=-1, verbosity=0)) and i see a gap at 120 lets say. should i get back and compute only 120 ? tg = tp.TopOGraph(n_eigs=120, n_jobs=-1, verbosity=0)

for 4- it was more like an observation for me in my dataset that all celltypes had similar i.D and i have myleoid lineage very low I.D ( check pic below) so i was wondering what would that mean and if it safe to say that monocytes and DC are less heterogeneous ?

Thanks again this is very inspiring work

here is classic umap vs topoMAP from same data again after scaling ( umap from "umap_Y = UMAP(n_components=2, metric='cosine').fit_transform(adata.X)" the model used

adata = tp.sc.topological_workflow(
    adata,                  # the anndata object
    tg,                # the TopOGraph object
    kernels=['bw_adaptive'],# the kernel(s) to use
    eigenmap_methods=['DM'],# the eigenmap method(s) to use
    projections=['MAP'],    # the projection(s) to use
    resolution=0.8          # the Leiden clustering resolution
)

topo_MAP image

UMAP

image i.d per celltype ( myeloid low in this dataset) image

Marwansha avatar Dec 07 '23 13:12 Marwansha