scanpy
scanpy copied to clipboard
sc.pl.paga: does not show all groups/nodes for color dpt_pseudotime
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the master branch of scanpy.
When trying to plot the PAGA graph some of the nodes don't show up in the graph. The nodes/clusters don't show up specifically for color=dpt_pseudotime. The nodes are still visible with categorical variables, and with other continuous variables. Even when copying dpt_pseudotime column, the color=dpt_pseudotime_copy does not show up correctly.
Minimal code sample
# preprocessing
sc.pp.recipe_zheng17(adata)
adata_wt= adata[adata.obs["genotype"].isin(["WT"])]
adata_pca = sc.tl.pca(adata_wt, svd_solver='arpack', copy=True)
adata_n = sc.pp.neighbors(adata_pca, n_neighbors=4, n_pcs=20, copy=True)
adata_graph = sc.tl.draw_graph(adata_n, copy=True)
# paga
adata_full = sc.tl.paga(adata_graph, groups='final_bulk_labels', copy=True)
# dpt
adata_full.uns['iroot'] = np.flatnonzero(adata_full.obs['final_bulk_labels'] == 'HSC')[1000]
adata_paga_dpt_nonan = sc.tl.diffmap(adata_full, copy=True, n_comps=10)
adata_paga_dpt_nonan = sc.tl.dpt(adata_paga_dpt_nonan, copy=True)
adata_paga_dpt_nonan.obs["dpt_pseudotime_copy"]=adata_paga_dpt_nonan.obs["dpt_pseudotime"]
sc.pl.paga(adata_paga_dpt_nonan,
threshold=0.05,
color=['dpt_pseudotime', 'final_bulk_labels', 'dpt_pseudotime_copy', 'total_counts'],
# layout: Optional[_IGraphLayout] = None,
# layout_kwds: Mapping[str, Any] = MappingProxyType({}),
# init_pos: Optional[np.ndarray] = None,
# root: Union[int, str, Sequence[int], None] = 0,
# labels: Union[str, Sequence[str], Mapping[str, str], None] = None,
single_component = True,
solid_edges= 'connectivities',
# dashed_edges: Optional[str] = None,
# transitions: Optional[str] = None,
fontsize = 5,
fontweight='light',
# fontoutline=2,
# text_kwds: Mapping[str, Any] = MappingProxyType({}),
node_size_scale = 3,
node_size_power= 1,
# edge_width_scale: float = 1.0,
# min_edge_width: Optional[float] = None,
# max_edge_width: Optional[float] = None,
# arrowsize: int = 30,
# title: Optional[str] = None,
# left_margin: float = 0.01,
# random_state: Optional[int] = 0,
# pos: Union[np.ndarray, str, Path, None] = None,
normalize_to_color=False,
# cmap: Union[str, Colormap] = None,
# cax: Optional[Axes] = None,
# colorbar=None, # TODO: this seems to be unused
# cb_kwds: Mapping[str, Any] = MappingProxyType({}),
frameon = False,
add_pos = True,
# export_to_gexf: bool = False,
use_raw=True,
# colors=None, # backwards compat
# groups=None, # backwards compat
# plot: bool = True,
# show: Optional[bool] = None,
save="/reg_label_full_nonan.pdf"
# ax: Optional[Axes] = None,
)
Plot output showing lack of nodes with dpt_pseudotime:
reg_label_full_nonan.pdf
Versions
-----
anndata 0.8.0
scanpy 1.9.1
-----
PIL 9.0.1
appnope 0.1.3
asttokens NA
backcall 0.2.0
beta_ufunc NA
binom_ufunc NA
cffi 1.15.0
colorama 0.4.4
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.0
decorator 5.1.1
defusedxml 0.7.1
entrypoints 0.4
executing 0.8.3
fontTools 4.25.0
google NA
h5py 3.6.0
hypergeom_ufunc NA
igraph 0.9.10
ipykernel 6.13.1
ipython_genutils 0.2.0
jedi 0.18.1
joblib 1.1.0
jupyter_server 1.17.1
kiwisolver 1.3.2
leidenalg 0.8.10
llvmlite 0.38.0
louvain 0.7.1
matplotlib 3.5.1
matplotlib_inline NA
mpl_toolkits NA
natsort 7.1.1
nbinom_ufunc NA
networkx 2.7.1
numba 0.55.1
numpy 1.21.6
packaging 21.3
pandas 1.4.2
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.29
psutil 5.9.1
ptyprocess 0.7.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.8.0
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.11.2
pyparsing 3.0.4
pytz 2021.3
scipy 1.8.0
session_info 1.0.0
setuptools 61.2.0
six 1.16.0
sklearn 1.0.2
sphinxcontrib NA
stack_data 0.2.0
texttable 1.6.4
threadpoolctl 2.2.0
tornado 6.1
traitlets 5.2.2
typing_extensions NA
wcwidth 0.2.5
yaml 6.0
zipp NA
zmq 23.1.0
-----
IPython 8.4.0
jupyter_client 7.3.4
jupyter_core 4.10.0
jupyterlab 3.4.3
notebook 6.4.12
-----
Python 3.8.13 (default, Mar 28 2022, 06:16:26) [Clang 12.0.0 ]
macOS-10.16-x86_64-i386-64bit
-----
Session information updated at 2022-07-08 11:58
I have found the issue, or at least the reason why the nodes don't appear. Within the clusters that do not show up, there is at least one cell that has a value of np.Inf in the column "dpt_pseudotime". As a results the mean (in this case "dpt_pseudotime") value across the cluster is also np.Inf.
So as a related question would be: is it normal/expected to have np.Inf values from the scanpy pseudotime analysis?