anndata Seeming incompatibility with the numpy matrix subclass

Seeming incompatibility with the numpy matrix subclass

Open adrian-valente opened this issue 1 year ago • 2 comments

Please make sure these conditions are met

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of scanpy.
[ ] (optional) I have confirmed this bug exists on the master branch of scanpy.

What happened?

For reasons I could not explain, I have found that the method filter_genes causes a ValueError when the adata.X object is of the type numpy.matrix. It is easy to circumvent by converting it to a general ndarray, but I wanted to file it here for reference, as matrix objects are still given by default by some methods (such as the todense() method of a sparse matrix) and matrix is a subclass of ndarray so it is not easy to identify it as a type error. Here is a minimal code sample

Minimal code sample

import scanpy as sc
import anndata
import numpy as np
import pandas as pd
X = np.matrix([[1, 2], [3, 0]])
print(isinstance(X, np.ndarray))
ad = anndata.AnnData(X=X, obs={'obs_names': ['a', 'b']}, var={'vars_names': ['v1', 'v2']})
sc.pp.filter_genes(ad, min_cells=2)

Error output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[47], line 9
      7 print(isinstance(X, np.ndarray))
      8 ad = anndata.AnnData(X=X, obs={'obs_names': ['a', 'b']}, var={'vars_names': ['v1', 'v2']})
----> 9 sc.pp.filter_genes(ad, min_cells=2)

File ~/miniconda3/envs/bio/lib/python3.10/site-packages/scanpy/preprocessing/_simple.py:250, in filter_genes(data, min_counts, min_cells, max_counts, max_cells, inplace, copy)
    248     adata.var['n_counts'] = number
    249 else:
--> 250     adata.var['n_cells'] = number
    251 adata._inplace_subset_var(gene_subset)
    252 return adata if copy else None

File ~/miniconda3/envs/bio/lib/python3.10/site-packages/pandas/core/frame.py:3950, in DataFrame.__setitem__(self, key, value)
   3947     self._setitem_array([key], value)
   3948 else:
   3949     # set column
-> 3950     self._set_item(key, value)

File ~/miniconda3/envs/bio/lib/python3.10/site-packages/pandas/core/frame.py:4143, in DataFrame._set_item(self, key, value)
   4133 def _set_item(self, key, value) -> None:
   4134     """
   4135     Add series to DataFrame in specified column.
   4136 
   (...)
   4141     ensure homogeneity.
   4142     """
-> 4143     value = self._sanitize_column(value)
   4145     if (
   4146         key in self.columns
   4147         and value.ndim == 1
   4148         and not is_extension_array_dtype(value)
   4149     ):
   4150         # broadcast across multiple columns if necessary
   4151         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File ~/miniconda3/envs/bio/lib/python3.10/site-packages/pandas/core/frame.py:4870, in DataFrame._sanitize_column(self, value)
   4867     return _reindex_for_setitem(Series(value), self.index)
   4869 if is_list_like(value):
-> 4870     com.require_length_match(value, self.index)
   4871 return sanitize_array(value, self.index, copy=True, allow_2d=True)

File ~/miniconda3/envs/bio/lib/python3.10/site-packages/pandas/core/common.py:576, in require_length_match(data, index)
    572 """
    573 Check the length of data matches the length of the index.
    574 """
    575 if len(data) != len(index):
--> 576     raise ValueError(
    577         "Length of values "
    578         f"({len(data)}) "
    579         "does not match length of index "
    580         f"({len(index)})"
    581     )

ValueError: Length of values (1) does not match length of index (2)

Versions

-----
anndata     0.10.2
scanpy      1.9.6
-----
PIL                 9.4.0
appnope             0.1.3
asttokens           NA
backcall            0.2.0
bottleneck          1.3.5
cffi                1.15.1
comm                0.1.3
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.7
decorator           5.1.1
exceptiongroup      1.1.3
executing           1.2.0
google              NA
h5py                3.9.0
ipykernel           6.23.1
jedi                0.18.2
joblib              1.3.2
kiwisolver          1.4.4
llvmlite            0.41.1
matplotlib          3.7.2
mkl                 2.4.0
mpl_toolkits        NA
natsort             8.4.0
numba               0.58.1
numexpr             2.8.4
numpy               1.25.2
packaging           23.1
pandas              2.0.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.5.1
prompt_toolkit      3.0.38
psutil              5.9.0
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.15.1
pyparsing           3.0.9
pytz                2022.7
scipy               1.11.1
session_info        1.0.0
six                 1.16.0
sklearn             1.3.0
stack_data          0.6.2
threadpoolctl       3.2.0
tornado             6.3.2
traitlets           5.9.0
typing_extensions   NA
wcwidth             0.2.6
yaml                6.0.1
zmq                 25.1.0
zoneinfo            NA
-----
IPython             8.14.0
jupyter_client      8.2.0
jupyter_core        5.3.0
-----
Python 3.10.10 (main, Mar 21 2023, 13:41:39) [Clang 14.0.6 ]
macOS-10.16-x86_64-i386-64bit
-----
Session information updated at 2024-01-24 17:43

Jan 24 '24 16:01 adrian-valente

anndata anndata copied to clipboard

Seeming incompatibility with the numpy matrix subclass

Please make sure these conditions are met

What happened?

Minimal code sample

Error output

Versions

anndata
anndata copied to clipboard