pandas
pandas copied to clipboard
BUG: iloc not possible for sparse DataFrame
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Two different errors depending on dtype.
df = pd.DataFrame([[1., 0., 1.5], [0., 2., 0.]], dtype=pd.SparseDtype(float))
df.iloc[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/indexing.py", line 967, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/indexing.py", line 1522, in _getitem_axis
return self.obj._ixs(key, axis=axis)
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/frame.py", line 3424, in _ixs
new_values = self._mgr.fast_xs(i)
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1012, in fast_xs
result[rl] = blk.iget((i, loc))
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/arrays/sparse/array.py", line 610, in __setitem__
raise TypeError(msg)
TypeError: SparseArray does not support item assignment via setitem
df = pd.DataFrame([[1, 0, 1], [0, 2, 0]], dtype=pd.SparseDtype(int))
df.iloc[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/indexing.py", line 967, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/indexing.py", line 1522, in _getitem_axis
return self.obj._ixs(key, axis=axis)
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/frame.py", line 3424, in _ixs
new_values = self._mgr.fast_xs(i)
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1003, in fast_xs
result = cls._empty((n,), dtype=dtype)
File "/Users/devin/anaconda3/envs/sw310/lib/python3.10/site-packages/pandas/core/arrays/base.py", line 1545, in _empty
raise NotImplementedError(
NotImplementedError: Default 'empty' implementation is invalid for dtype='Sparse[int64, 0]'
Issue Description
Getting exceptions from fast_xs
when trying to use iloc
on a sparse DataFrame. For a float
dtype, the exception is for setitem but I am only looking to get an item. It looks like fast_xs
constructs an empty SparseArray
and then uses setitem, which is causing the error. For an int
dtype, the empty object cannot be constructed.
Expected Behavior
I would expect a sparse Series
to be returned.
Installed Versions
INSTALLED VERSIONS
commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.10.0.final.0 python-bits : 64 OS : Darwin OS-release : 21.2.0 Version : Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.4.1 numpy : 1.21.5 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.4 Cython : 0.29.25 pytest : 6.2.4 hypothesis : None sphinx : 4.4.0 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.2 IPython : None pandas_datareader: None bs4 : None bottleneck : 1.3.2 fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : 2.8.0 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 sqlalchemy : None tables : None tabulate : None xarray : 2022.3.0 xlrd : None xlwt : None zstandard : None
I tried above example with dtype=pd.SparseDtype(bool) and get similar error
df = pd.DataFrame([[1, 0, 1], [0, 1, 0]], dtype=pd.SparseDtype(bool))
df.iloc[0]
NotImplementedError: Default 'empty' implementation is invalid for dtype='Sparse[bool, False]'
take
@adamjliang any progress on this?
Looks like this error occurs with pandas 1.4.0 and 1.4.2 but not 1.3.5
Looks like this error occurs with pandas 1.4.0 and 1.4.2 but not 1.3.5
first bad commit: [9bf09adcdab1a5babaed9f3315f2572cebf9f1f2] REF: fast_xs avoid object dtype (#43203)
cc @jbrockmendel
any workarounds to this?
any workarounds to this?
Not really. We'll have to re-write something; not yet clear to me where the appropriate fix is. If you're in desperate need of a patch right now, reverting #43203 locally should do the trick.
We'll have to re-write something; not yet clear to me where the appropriate fix is.
@jbrockmendel i've opened a PR https://github.com/pandas-dev/pandas/pull/48246 that effectively restores the 1.3.5 code path, but just for SparseDtype.
We maybe need to identify any immutable EAs and either special case as in the PR or not take the fast path?