scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

color UMAP with more clusters

Open alexlenail opened this issue 2 years ago • 6 comments

The allen institute has published a nomenclature for brain cell types with 127 types. When I annotate my cells with those types, the umap plot doesn't color the cells:

image

Would it be possible to support coloring this number of clusters?

alexlenail avatar Aug 26 '22 15:08 alexlenail

Hi @alexlenail,

The issue here is that colour maps with this amount of distinct colours just don't really exist. They could be created, but even if created you would not be able to distinguish the colours by eye. A solution that I have used in this scenario is to colour only a subset of the clusters in any single plot and label the rest as "other". You can do that by just making new adata.obs columns with the new label.

Something like (untested code):

list_of_cts = ['Exc L5-6 FEZF2 CFTR', 'Exc L5-6 FEZF2 FILIP1L', 'Exc L5-6 FEZF2 IFNG-AS1']
adata.obs['ct_group1'] = [ct if ct in list_of_cts else "Other" for ct in adata.obs['Spearman_BICCN_M1_classification']]
sc.pl.umap(adata, 'ct_group1')

You could just add a loop around that with the cell types you want to colour together. Hope that helps.

LuckyMD avatar Aug 27 '22 00:08 LuckyMD

Hi @LuckyMD I don't think the colors don't need to each be too distinguishable from one another, just so long as there aren't two contiguous clusters with indistinguishable colors. As an analogy, geographic maps often only use a few colors, but no two sides of a border are ever colored the same. I think scanpy should provide this feature, perhaps with a warning.

alexlenail avatar Sep 06 '22 17:09 alexlenail

Some thoughts on implementation: This refers to the graph colouring problem. We could implement something like this in scanpy. We'd need a graph that represents "neighbouring" in the UMAP. An (imperfect) approximation of this could be a PAGA graph. I quickly found a few bits of code, but not really in a nice maintained library as far as I can tell:

  • code snippet for greedy graph colouring: https://python.plainenglish.io/solve-graph-coloring-problem-with-greedy-algorithm-and-python-6661ab4154bd
  • This is a map colouring algorithm, which would require graphs that can be put into a 2D layout (which should be the case for UMAPs, but not necessarily for PAGA graphs): blog intro: https://four-color-theorem.org/introduction/

Overall... ideally we'd have a well maintained library for this.

LuckyMD avatar Sep 07 '22 13:09 LuckyMD

There is something if we want to make Networkx a dependency.

Igraph only seems to have graph colouring in their c library.

LuckyMD avatar Sep 07 '22 13:09 LuckyMD

I think it might be enough to leave it to random chance. As it is now, sc.pl.umap works with 70 colors, which aren't easily distinguishable -- but neighboring clusters seem to always be distinguishable. The easiest fix to this issue would just be to support a larger color palette -- maybe even kicking the can down the road up to ~150 colors.

alexlenail avatar Sep 08 '22 00:09 alexlenail

150 colours will not help imo. Typically very large colour maps make neighbouring clusters indistinguishable...

LuckyMD avatar Sep 08 '22 10:09 LuckyMD

Hi @alexlenail

Re

I think it might be enough to leave it to random chance. As it is now, sc.pl.umap works with 70 colors, which aren't easily distinguishable -- but neighboring clusters seem to always be distinguishable.

How can I access the 70 colors? I have a plot with 54 clusters but they're all colored gray. umap_10x_gse_leiden_bc_0001

malonzm1 avatar Dec 04 '23 01:12 malonzm1

@malonzm1 please ask questions like this at https://discourse.scverse.org/

flying-sheep avatar Dec 04 '23 08:12 flying-sheep