scanpy
scanpy copied to clipboard
color UMAP with more clusters
The allen institute has published a nomenclature for brain cell types with 127 types. When I annotate my cells with those types, the umap plot doesn't color the cells:
data:image/s3,"s3://crabby-images/449bb/449bb99f039ee62dc0e582c6070ea8e54e2ead11" alt="image"
Would it be possible to support coloring this number of clusters?
Hi @alexlenail,
The issue here is that colour maps with this amount of distinct colours just don't really exist. They could be created, but even if created you would not be able to distinguish the colours by eye. A solution that I have used in this scenario is to colour only a subset of the clusters in any single plot and label the rest as "other". You can do that by just making new adata.obs
columns with the new label.
Something like (untested code):
list_of_cts = ['Exc L5-6 FEZF2 CFTR', 'Exc L5-6 FEZF2 FILIP1L', 'Exc L5-6 FEZF2 IFNG-AS1']
adata.obs['ct_group1'] = [ct if ct in list_of_cts else "Other" for ct in adata.obs['Spearman_BICCN_M1_classification']]
sc.pl.umap(adata, 'ct_group1')
You could just add a loop around that with the cell types you want to colour together. Hope that helps.
Hi @LuckyMD I don't think the colors don't need to each be too distinguishable from one another, just so long as there aren't two contiguous clusters with indistinguishable colors. As an analogy, geographic maps often only use a few colors, but no two sides of a border are ever colored the same. I think scanpy should provide this feature, perhaps with a warning.
Some thoughts on implementation: This refers to the graph colouring problem. We could implement something like this in scanpy. We'd need a graph that represents "neighbouring" in the UMAP. An (imperfect) approximation of this could be a PAGA graph. I quickly found a few bits of code, but not really in a nice maintained library as far as I can tell:
- code snippet for greedy graph colouring: https://python.plainenglish.io/solve-graph-coloring-problem-with-greedy-algorithm-and-python-6661ab4154bd
- This is a map colouring algorithm, which would require graphs that can be put into a 2D layout (which should be the case for UMAPs, but not necessarily for PAGA graphs): blog intro: https://four-color-theorem.org/introduction/
Overall... ideally we'd have a well maintained library for this.
There is something if we want to make Networkx a dependency.
Igraph only seems to have graph colouring in their c library.
I think it might be enough to leave it to random chance. As it is now, sc.pl.umap
works with 70 colors, which aren't easily distinguishable -- but neighboring clusters seem to always be distinguishable. The easiest fix to this issue would just be to support a larger color palette -- maybe even kicking the can down the road up to ~150 colors.
150 colours will not help imo. Typically very large colour maps make neighbouring clusters indistinguishable...
Hi @alexlenail
Re
I think it might be enough to leave it to random chance. As it is now,
sc.pl.umap
works with 70 colors, which aren't easily distinguishable -- but neighboring clusters seem to always be distinguishable.
How can I access the 70 colors? I have a plot with 54 clusters but they're all colored gray.
@malonzm1 please ask questions like this at https://discourse.scverse.org/