problem in intetpreting the SAMap integration results
Hello,
Thank you for developing such a useful tool! I'm working on integrating scRNAseq data cross species, and with the samap tools, I got an integration result that looks pretty good. To interpret the samap results, I have some confusion that hoping to get your hlep.
My stitched samap umap as below,
My problem is,
- I had passed known cell annotation to
keysandneigh_from_keysin samap run, and I want to know if it is necessary to pass two parameters at the same time, because I only passed the cell annotation toneigh_from_keysbefore. In addition, do you think using leidn clustering would improve the integration result? - for some cell types, It's not a complete one-to-one correspondence (based on cell annotation resolutions). And I want to identify the specific cell barcode that mapping or unmapping to a certain cell type of another species, such as cell label transfer, how can I achieve it?
Thank you in anticipation
Best regards
-
neigh_from_keysactually expects a dictionary of booleans keyed by species ID - sorry the documentation isn't clear. Species whereneigh_from_keysisTrueuse the values defined inkeysto determine neighborhoods. By default,keysuses leiden clustering. So if you'd like to use custom annotations the right way is to setneigh_from_keysto True and setkeysto the annotation column name for each species. (Incidentally, settingneigh_from_keysto a dictionary of strings ends up being truthy anyway, so you probably don't need to rerun samap.) -
If you're comfortable working with sparse adjacency matrices, you can always look at the graph in
sm.samap.adata.obsp['connectivities']and for each row (cell) see which other cells it is connected to (nonzero columns).
Thanks for your clear response, I set both keys and neigh_from_keys to my annotation column, code as below,
names={'mo':ENSMUST_array,'ze':ENSDART_array}
sm = SAMAP(filenames,f_maps = './maps/',save_processed=False, names=names,keys ={'mo':'celltype.predicted','ze':'ClusterName_short'})
sm.run(neigh_from_keys={'mo':'celltype.predicted','ze':'ClusterName_short'})
samap = sm.samap
And I wanted to identify aligned cell types by caculating cell type mapping scores, most of the cell types connected as expected with high mapping scores. However, a small portion of cell types showed either low mapping scores or incorrect connections, which I suspect may be due to inconsistencies in the granularity of cell annotations.
I would like to inquire about the following:
- What is the threshold for a reliable mapping score? it's robust in the quantity of a certain cell type?
- After rerunning SAMap on a subset of cell types ( not a one-to-one correspondence), I noticed that the cells from the species with fewer cells were more scattered on the UMAP. Could this be due to over-integration?
custome cluster annotation
leiden_cluster