scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

sc.pl.paga: does not show all groups/nodes for color dpt_pseudotime

Open c-westhoven opened this issue 2 years ago • 1 comments

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of scanpy.
  • [ ] (optional) I have confirmed this bug exists on the master branch of scanpy.

When trying to plot the PAGA graph some of the nodes don't show up in the graph. The nodes/clusters don't show up specifically for color=dpt_pseudotime. The nodes are still visible with categorical variables, and with other continuous variables. Even when copying dpt_pseudotime column, the color=dpt_pseudotime_copy does not show up correctly.

Minimal code sample

# preprocessing
sc.pp.recipe_zheng17(adata)
adata_wt= adata[adata.obs["genotype"].isin(["WT"])]
adata_pca = sc.tl.pca(adata_wt, svd_solver='arpack', copy=True)
adata_n = sc.pp.neighbors(adata_pca, n_neighbors=4, n_pcs=20, copy=True)
adata_graph = sc.tl.draw_graph(adata_n, copy=True)
# paga
adata_full = sc.tl.paga(adata_graph, groups='final_bulk_labels', copy=True)
# dpt
adata_full.uns['iroot'] = np.flatnonzero(adata_full.obs['final_bulk_labels'] == 'HSC')[1000]
adata_paga_dpt_nonan = sc.tl.diffmap(adata_full, copy=True, n_comps=10)
adata_paga_dpt_nonan = sc.tl.dpt(adata_paga_dpt_nonan, copy=True)

adata_paga_dpt_nonan.obs["dpt_pseudotime_copy"]=adata_paga_dpt_nonan.obs["dpt_pseudotime"]

sc.pl.paga(adata_paga_dpt_nonan, 
           threshold=0.05, 
           color=['dpt_pseudotime', 'final_bulk_labels', 'dpt_pseudotime_copy', 'total_counts'],
           
           # layout: Optional[_IGraphLayout] = None,
           # layout_kwds: Mapping[str, Any] = MappingProxyType({}),
           # init_pos: Optional[np.ndarray] = None,
           # root: Union[int, str, Sequence[int], None] = 0,
           # labels: Union[str, Sequence[str], Mapping[str, str], None] = None,
           single_component = True,
           solid_edges= 'connectivities',
           # dashed_edges: Optional[str] = None,
           # transitions: Optional[str] = None,
           fontsize = 5,

           fontweight='light', 
           # fontoutline=2, 
           # text_kwds: Mapping[str, Any] = MappingProxyType({}),
           node_size_scale = 3, 
           node_size_power= 1,
           # edge_width_scale: float = 1.0,
           # min_edge_width: Optional[float] = None,
           # max_edge_width: Optional[float] = None,
           # arrowsize: int = 30,
           # title: Optional[str] = None,
           # left_margin: float = 0.01,
           # random_state: Optional[int] = 0,
           # pos: Union[np.ndarray, str, Path, None] = None,
           normalize_to_color=False,
           # cmap: Union[str, Colormap] = None,
           # cax: Optional[Axes] = None,
           # colorbar=None,  # TODO: this seems to be unused
           # cb_kwds: Mapping[str, Any] = MappingProxyType({}),
           frameon = False,
           add_pos = True,
           # export_to_gexf: bool = False,
           
           use_raw=True,
           
           # colors=None,  # backwards compat
           #  groups=None,  # backwards compat
           #  plot: bool = True,
           #  show: Optional[bool] = None,
           save="/reg_label_full_nonan.pdf"
           #  ax: Optional[Axes] = None,
          )

Plot output showing lack of nodes with dpt_pseudotime: reg_label_full_nonan.pdf reg_label_full_nonan

Versions

-----
anndata     0.8.0
scanpy      1.9.1

-----
PIL                 9.0.1
appnope             0.1.3
asttokens           NA
backcall            0.2.0
beta_ufunc          NA
binom_ufunc         NA
cffi                1.15.0
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.0
decorator           5.1.1
defusedxml          0.7.1
entrypoints         0.4
executing           0.8.3
fontTools           4.25.0
google              NA
h5py                3.6.0
hypergeom_ufunc     NA
igraph              0.9.10
ipykernel           6.13.1
ipython_genutils    0.2.0
jedi                0.18.1
joblib              1.1.0
jupyter_server      1.17.1
kiwisolver          1.3.2
leidenalg           0.8.10
llvmlite            0.38.0
louvain             0.7.1
matplotlib          3.5.1
matplotlib_inline   NA
mpl_toolkits        NA
natsort             7.1.1
nbinom_ufunc        NA
networkx            2.7.1
numba               0.55.1
numpy               1.21.6
packaging           21.3
pandas              1.4.2
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.29
psutil              5.9.1
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.8.0
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.11.2
pyparsing           3.0.4
pytz                2021.3
scipy               1.8.0
session_info        1.0.0
setuptools          61.2.0
six                 1.16.0
sklearn             1.0.2
sphinxcontrib       NA
stack_data          0.2.0
texttable           1.6.4
threadpoolctl       2.2.0
tornado             6.1
traitlets           5.2.2
typing_extensions   NA
wcwidth             0.2.5
yaml                6.0
zipp                NA
zmq                 23.1.0
-----
IPython             8.4.0
jupyter_client      7.3.4
jupyter_core        4.10.0
jupyterlab          3.4.3
notebook            6.4.12
-----
Python 3.8.13 (default, Mar 28 2022, 06:16:26) [Clang 12.0.0 ]
macOS-10.16-x86_64-i386-64bit
-----
Session information updated at 2022-07-08 11:58

c-westhoven avatar Jul 08 '22 16:07 c-westhoven

I have found the issue, or at least the reason why the nodes don't appear. Within the clusters that do not show up, there is at least one cell that has a value of np.Inf in the column "dpt_pseudotime". As a results the mean (in this case "dpt_pseudotime") value across the cluster is also np.Inf.

So as a related question would be: is it normal/expected to have np.Inf values from the scanpy pseudotime analysis?

c-westhoven avatar Jul 08 '22 20:07 c-westhoven