scanpy
scanpy copied to clipboard
AxisError when calculating QC metrics on backed data
Please make sure these conditions are met
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.
What happened?
Loading data in backed mode, I get an AxisError when trying to calculate QC metrics. Problem has happened on three different datasets but doesn't happen when I read the data into memory.
Minimal code sample
sc.datasets.pbmc3k()
pbmc = sc.read_h5ad('data/pbmc3k_raw.h5ad', backed = 'r+')
pbmc.var['mt'] = pbmc.var_names.str.startswith('MT-')
pbmc.var['ribo'] = pbmc.var_names.str.startswith(("RPS", "RPL"))
sc.pp.calculate_qc_metrics(pbmc, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
Error output
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
Cell In[8], line 3
1 pbmc.var['mt'] = pbmc.var_names.str.startswith('MT-')
2 pbmc.var['ribo'] = pbmc.var_names.str.startswith(("RPS", "RPL"))
----> 3 sc.pp.calculate_qc_metrics(pbmc, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/scanpy/preprocessing/_qc.py:315, in calculate_qc_metrics(adata, expr_type, var_type, qc_vars, percent_top, layer, use_raw, inplace, log1p, parallel)
312 if isinstance(qc_vars, str):
313 qc_vars = [qc_vars]
--> 315 obs_metrics = describe_obs(
316 adata,
317 expr_type=expr_type,
318 var_type=var_type,
319 qc_vars=qc_vars,
320 percent_top=percent_top,
321 inplace=inplace,
322 X=X,
323 log1p=log1p,
324 )
325 var_metrics = describe_var(
326 adata,
327 expr_type=expr_type,
(...)
331 log1p=log1p,
332 )
334 if not inplace:
File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/scanpy/preprocessing/_qc.py:109, in describe_obs(adata, expr_type, var_type, qc_vars, percent_top, layer, use_raw, log1p, inplace, X, parallel)
107 obs_metrics[f"n_{var_type}_by_{expr_type}"] = X.getnnz(axis=1)
108 else:
--> 109 obs_metrics[f"n_{var_type}_by_{expr_type}"] = np.count_nonzero(X, axis=1)
110 if log1p:
111 obs_metrics[f"log1p_n_{var_type}_by_{expr_type}"] = np.log1p(
112 obs_metrics[f"n_{var_type}_by_{expr_type}"]
113 )
File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/numpy/core/numeric.py:486, in count_nonzero(a, axis, keepdims)
483 else:
484 a_bool = a.astype(np.bool_, copy=False)
--> 486 return a_bool.sum(axis=axis, dtype=np.intp, keepdims=keepdims)
File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/numpy/core/_methods.py:49, in _sum(a, axis, dtype, out, keepdims, initial, where)
47 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
48 initial=_NoValue, where=True):
---> 49 return umr_sum(a, axis, dtype, out, keepdims, initial, where)
AxisError: axis 1 is out of bounds for array of dimension 0
Versions
-----
anndata 0.10.7
scanpy 1.10.1
-----
PIL 10.3.0
anyio NA
arrow 1.3.0
asttokens NA
attr 23.2.0
attrs 23.2.0
babel 2.14.0
certifi 2024.02.02
cffi 1.16.0
charset_normalizer 3.3.2
colorama 0.4.6
comm 0.2.2
cycler 0.12.1
cython_runtime NA
dateutil 2.9.0
debugpy 1.8.1
decorator 5.1.1
defusedxml 0.7.1
executing 2.0.1
fastjsonschema NA
fqdn NA
h5py 3.11.0
idna 3.7
igraph 0.11.4
ipykernel 6.29.4
isoduration NA
jedi 0.19.1
jinja2 3.1.3
joblib 1.4.0
json5 0.9.24
jsonpointer 2.4
jsonschema 4.21.1
jsonschema_specifications NA
jupyter_events 0.10.0
jupyter_server 2.14.0
jupyterlab_server 2.26.0
kiwisolver 1.4.5
legacy_api_wrap NA
leidenalg 0.10.2
llvmlite 0.42.0
markupsafe 2.1.5
matplotlib 3.8.4
matplotlib_inline 0.1.6
mpl_toolkits NA
natsort 8.4.0
nbformat 5.10.4
numba 0.59.1
numpy 1.26.4
overrides NA
packaging 24.0
pandas 2.2.2
parso 0.8.4
patsy 0.5.6
platformdirs 4.2.0
prometheus_client NA
prompt_toolkit 3.0.43
psutil 5.9.8
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.17.2
pyparsing 3.1.2
pythonjsonlogger NA
pytz 2024.1
referencing NA
requests 2.31.0
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
rpds NA
scipy 1.13.0
seaborn 0.13.2
send2trash NA
session_info 1.0.0
six 1.16.0
sklearn 1.4.1.post1
sniffio 1.3.1
stack_data 0.6.3
statsmodels 0.14.1
texttable 1.7.0
threadpoolctl 3.4.0
tornado 6.4
traitlets 5.14.2
uri_template NA
urllib3 2.2.1
wcwidth 0.2.13
webcolors 1.13
websocket 1.7.0
yaml 6.0.1
zmq 25.1.2
-----
IPython 8.23.0
jupyter_client 8.6.1
jupyter_core 5.7.2
jupyterlab 4.1.6
-----
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0]
Linux-5.14.0-362.8.1.el9_3.x86_64-x86_64-with-glibc2.34
-----
Session information updated at 2024-04-12 13:17
We will start to return helpful errors for when we don't support something, and allow currently passing things to continue as such.