scanpy
scanpy copied to clipboard
Scanpy neighbors bug with identical/almost identical cells.
- [x] I have checked that this issue has not already been reported.
- [] I have confirmed this bug exists on the latest version of scanpy.
- [] (optional) I have confirmed this bug exists on the master branch of scanpy.
sc.pp.neighbors fails to identify neighbours when identical/almost identical cells exist.
I have a very sparse x matrix with only a few genes, where some cells end up identical by coincidence ( these ilocs: [1076, 2066, 3775, 1076, 3122, 3751]). Those cells are reported as having 0 neighbours, when they should neighbour each other
adata = anndata.AnnData(X = x)
sc.pp.pca(adata, n_comps = 20 )
sc.pp.neighbors(adata, n_neighbors = 5)
print(adata.obsp["distances"][1076, :].nonzero())
(array([], dtype=int32), array([], dtype=int32))
This happens if there are duplicate rows in adata (cells with exactly the same gene count profile)This even happened with two rows that are not exactly equal (though almost)
>adata.obs.iloc[[1662, 3578]]
| n_counts | n_genes
| 10.0 | 3
| 11.0 | 4
> adata.X[[1662, 3578]]
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 8.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 8.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.]], dtype=float32)
Versions
anndata 0.7.6 scanpy 1.8.1 sinfo 0.3.4
PIL 8.3.2 SpaGCN 1.2.2 anyio NA apport_python_hook NA asciitree NA attr 21.2.0 babel 2.9.1 backcall 0.2.0 beta_ufunc NA binom_ufunc NA certifi 2019.11.28 cffi 1.14.6 chardet 3.0.4 charset_normalizer 2.0.7 cloudpickle 2.0.0 colorama 0.4.3 cycler 0.10.0 cython_runtime NA dask 2021.10.0 dateutil 2.8.0 debugpy 1.4.1 decorator 5.0.9 defusedxml 0.7.1 entrypoints 0.3 fasteners NA fsspec 2021.10.1 gi 3.36.0 gio NA glib NA gobject NA gtk NA h5py 3.5.0 idna 2.8 igraph 0.9.9 ipykernel 6.3.1 ipython_genutils 0.2.0 ipywidgets 7.6.5 jedi 0.18.0 jinja2 2.10.1 joblib 1.0.1 json5 NA jsonpointer 2.0 jsonschema 3.2.0 jupyter_server 1.10.2 jupyterlab_server 2.7.2 kiwisolver 1.3.2 leidenalg 0.8.8 llvmlite 0.37.0 louvain 0.7.1 markupsafe 1.1.0 matplotlib 3.4.3 matplotlib_inline NA mpl_toolkits NA natsort 7.1.1 nbclassic NA nbformat 5.1.3 nbinom_ufunc NA netifaces 0.10.4 numba 0.54.0 numcodecs 0.9.1 numexpr 2.7.3 numpy 1.20.3 packaging 21.0 pandas 1.3.2 parso 0.8.2 patsy 0.5.1 pexpect 4.6.0 pickleshare 0.7.5 pkg_resources NA prometheus_client NA prompt_toolkit 3.0.20 psutil 5.8.0 ptyprocess 0.7.0 pvectorc NA pycparser 2.20 pydev_ipython NA pydevconsole NA pydevd 2.4.1 pydevd_concurrency_analyser NA pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.10.0 pynndescent 0.5.5 pyparsing 2.4.7 pyrsistent NA pytz 2021.1 requests 2.26.0 roifile 2021.6.6 scipy 1.7.1 seaborn 0.11.2 send2trash NA shapely 1.7.1 simplejson 3.16.0 sitecustomize NA six 1.14.0 sklearn 1.0.2 sniffio 1.2.0 sparse 0.13.0 sphinxcontrib NA statsmodels 0.12.2 storemagic NA tables 3.6.1 terminado 0.11.1 texttable 1.6.4 threadpoolctl 2.2.0 tlz 0.11.1 toolz 0.11.1 torch 1.6.0 tornado 6.1 tqdm 4.62.2 traitlets 5.1.0 typing_extensions NA umap 0.5.1 urllib3 1.26.6 wcwidth 0.2.5 websocket 1.2.1 yaml 6.0 zarr 2.10.2 zmq 22.2.1 zope NA
IPython 7.27.0 jupyter_client 7.0.2 jupyter_core 4.7.1 jupyterlab 3.1.12 notebook 6.4.3
Python 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] Linux-4.4.0-19041-Microsoft-x86_64-with-glibc2.29 8 logical CPU cores, x86_64