scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

AxisError when calculating QC metrics on backed data

Open dn-ra opened this issue 10 months ago • 1 comments

Please make sure these conditions are met

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of scanpy.
  • [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

Loading data in backed mode, I get an AxisError when trying to calculate QC metrics. Problem has happened on three different datasets but doesn't happen when I read the data into memory.

Minimal code sample

sc.datasets.pbmc3k()
pbmc = sc.read_h5ad('data/pbmc3k_raw.h5ad', backed = 'r+')
pbmc.var['mt'] = pbmc.var_names.str.startswith('MT-')
pbmc.var['ribo'] = pbmc.var_names.str.startswith(("RPS", "RPL"))
sc.pp.calculate_qc_metrics(pbmc, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)

Error output

---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
Cell In[8], line 3
      1 pbmc.var['mt'] = pbmc.var_names.str.startswith('MT-')
      2 pbmc.var['ribo'] = pbmc.var_names.str.startswith(("RPS", "RPL"))
----> 3 sc.pp.calculate_qc_metrics(pbmc, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)

File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/scanpy/preprocessing/_qc.py:315, in calculate_qc_metrics(adata, expr_type, var_type, qc_vars, percent_top, layer, use_raw, inplace, log1p, parallel)
    312 if isinstance(qc_vars, str):
    313     qc_vars = [qc_vars]
--> 315 obs_metrics = describe_obs(
    316     adata,
    317     expr_type=expr_type,
    318     var_type=var_type,
    319     qc_vars=qc_vars,
    320     percent_top=percent_top,
    321     inplace=inplace,
    322     X=X,
    323     log1p=log1p,
    324 )
    325 var_metrics = describe_var(
    326     adata,
    327     expr_type=expr_type,
   (...)
    331     log1p=log1p,
    332 )
    334 if not inplace:

File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/scanpy/preprocessing/_qc.py:109, in describe_obs(adata, expr_type, var_type, qc_vars, percent_top, layer, use_raw, log1p, inplace, X, parallel)
    107     obs_metrics[f"n_{var_type}_by_{expr_type}"] = X.getnnz(axis=1)
    108 else:
--> 109     obs_metrics[f"n_{var_type}_by_{expr_type}"] = np.count_nonzero(X, axis=1)
    110 if log1p:
    111     obs_metrics[f"log1p_n_{var_type}_by_{expr_type}"] = np.log1p(
    112         obs_metrics[f"n_{var_type}_by_{expr_type}"]
    113     )

File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/numpy/core/numeric.py:486, in count_nonzero(a, axis, keepdims)
    483 else:
    484     a_bool = a.astype(np.bool_, copy=False)
--> 486 return a_bool.sum(axis=axis, dtype=np.intp, keepdims=keepdims)

File ~/miniconda3/envs/parse_sepsis/lib/python3.12/site-packages/numpy/core/_methods.py:49, in _sum(a, axis, dtype, out, keepdims, initial, where)
     47 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
     48          initial=_NoValue, where=True):
---> 49     return umr_sum(a, axis, dtype, out, keepdims, initial, where)

AxisError: axis 1 is out of bounds for array of dimension 0

Versions

-----
anndata     0.10.7
scanpy      1.10.1
-----
PIL                         10.3.0
anyio                       NA
arrow                       1.3.0
asttokens                   NA
attr                        23.2.0
attrs                       23.2.0
babel                       2.14.0
certifi                     2024.02.02
cffi                        1.16.0
charset_normalizer          3.3.2
colorama                    0.4.6
comm                        0.2.2
cycler                      0.12.1
cython_runtime              NA
dateutil                    2.9.0
debugpy                     1.8.1
decorator                   5.1.1
defusedxml                  0.7.1
executing                   2.0.1
fastjsonschema              NA
fqdn                        NA
h5py                        3.11.0
idna                        3.7
igraph                      0.11.4
ipykernel                   6.29.4
isoduration                 NA
jedi                        0.19.1
jinja2                      3.1.3
joblib                      1.4.0
json5                       0.9.24
jsonpointer                 2.4
jsonschema                  4.21.1
jsonschema_specifications   NA
jupyter_events              0.10.0
jupyter_server              2.14.0
jupyterlab_server           2.26.0
kiwisolver                  1.4.5
legacy_api_wrap             NA
leidenalg                   0.10.2
llvmlite                    0.42.0
markupsafe                  2.1.5
matplotlib                  3.8.4
matplotlib_inline           0.1.6
mpl_toolkits                NA
natsort                     8.4.0
nbformat                    5.10.4
numba                       0.59.1
numpy                       1.26.4
overrides                   NA
packaging                   24.0
pandas                      2.2.2
parso                       0.8.4
patsy                       0.5.6
platformdirs                4.2.0
prometheus_client           NA
prompt_toolkit              3.0.43
psutil                      5.9.8
pure_eval                   0.2.2
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.9.5
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.17.2
pyparsing                   3.1.2
pythonjsonlogger            NA
pytz                        2024.1
referencing                 NA
requests                    2.31.0
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rpds                        NA
scipy                       1.13.0
seaborn                     0.13.2
send2trash                  NA
session_info                1.0.0
six                         1.16.0
sklearn                     1.4.1.post1
sniffio                     1.3.1
stack_data                  0.6.3
statsmodels                 0.14.1
texttable                   1.7.0
threadpoolctl               3.4.0
tornado                     6.4
traitlets                   5.14.2
uri_template                NA
urllib3                     2.2.1
wcwidth                     0.2.13
webcolors                   1.13
websocket                   1.7.0
yaml                        6.0.1
zmq                         25.1.2
-----
IPython             8.23.0
jupyter_client      8.6.1
jupyter_core        5.7.2
jupyterlab          4.1.6
-----
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0]
Linux-5.14.0-362.8.1.el9_3.x86_64-x86_64-with-glibc2.34
-----
Session information updated at 2024-04-12 13:17

dn-ra avatar Apr 12 '24 03:04 dn-ra

We will start to return helpful errors for when we don't support something, and allow currently passing things to continue as such.

ilan-gold avatar Apr 18 '24 14:04 ilan-gold