Saving .h5ad with pd.Series in .uns results in IORegistryError
Please make sure these conditions are met
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of anndata.
- [ ] (optional) I have confirmed this bug exists on the master branch of anndata.
Report
Hi all,
anndata objects dont serialize as h5ad if they contain a pandas series. This is related to some comments in #797.
Code:
import anndata as ad
import pandas as pd
adata_uns_series = ad.AnnData()
adata_uns_series.uns['series'] = pd.Series({'a':1,'b':2,'c':3})
adata_uns_series.write('adata_uns_series.h5ad') # error
Traceback:
Traceback (most recent call last):
File "scratch/anndata_error.py", line 5, in <module>
adata_uns_series.write('adata_uns_series.h5ad') # error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_core/anndata.py", line 1929, in write_h5ad
write_h5ad(
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
write_elem(f, "uns", dict(adata.uns), dataset_kwargs=dataset_kwargs)
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/specs/registry.py", line 359, in write_elem
Writer(_REGISTRY).write_elem(store, k, elem, dataset_kwargs=dataset_kwargs)
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/utils.py", line 243, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/specs/registry.py", line 309, in write_elem
return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/specs/registry.py", line 57, in wrapper
result = func(g, k, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/specs/methods.py", line 312, in write_mapping
_writer.write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs)
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/utils.py", line 243, in func_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/specs/registry.py", line 304, in write_elem
self.find_writer(dest_type, elem, modifiers),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/specs/registry.py", line 269, in find_writer
return self.registry.get_writer(dest_type, type(elem), modifiers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/envs/anndata_0.10.6/lib/python3.12/site-packages/anndata/_io/specs/registry.py", line 117, in get_writer
raise IORegistryError._from_write_parts(dest_type, src_type, modifiers)
anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
Error raised while writing key 'series' of <class 'h5py._hl.group.Group'> to /uns
Versions
-----
anndata 0.10.6
session_info 1.0.0
-----
cython_runtime NA
dateutil 2.9.0
h5py 3.10.0
natsort 8.4.0
numpy 1.26.4
packaging 24.0
pandas 2.2.1
pytz 2024.1
scipy 1.12.0
six 1.16.0
-----
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0]
Linux-5.4.0-144-generic-x86_64-with-glibc2.31
-----
Session information updated at 2024-03-20 16:53
Could you share your use case for this?
To me, storing a pandas Series is basically the same thing as storing a 1d xarray DataArray. I want to support storing xarray object, and I don't want to have two ways to store the same thing.
Would making this a single column dataframe on your end work here?
Thanks for the quick response! I am gonna argue a bit for the pandas series :)
My immediate use case is serializing a mapping between coarse and fine categories which are also columns of .obsm dataframes. A pandas series is the natural object to use for this.
In principle, using a single column dataframe would work, as would using a plain dictionary. But both would be hacky. The cleaner way is using a pandas series. As far as I know pandas is the de facto standard for storing labeled 1d and 2d arrays. The labeled 2d arrays are already supported as pandas dataframe.
I do understand that it is somewhat redundant to have two ways to store the same thing. Then again, if you support xarray objects, then you introduce a second way of storing 2d labeled arrays as pandas dataframes are already supported, right?
Except for serialization a pandas series in .uns works just fine. One could restrict the types to write in .uns, or construct xarrays whenever pandas series or dataframes (or python dicts) are written into .uns. Personally, I would prefer the flexibility and symmetry in the support of pandas dataframes and pandas series.
.... Just some thoughts, what do you think?
Good to look back once xarray support lands