anndata TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Hi there,

I am running into this error when I am trying a h5ad file from my Anndata object. I downloaded a dataset from the Allen Brain Atlas [1], and then I loaded it using this code:

M1_matrix = pd.read_csv('/path/matrix.csv',index_col=0)
M1_rows = pd.read_csv('/path/human_MTG_2018-06-14_genes-rows.csv')
M1_rows.index=M1_rows['gene']
M1_columns = pd.read_csv('/path/Human_M1_data/metadata.csv')
M1_columns.index=M1_columns['sample_name']
import Anndata
adata = anndata.AnnData(X=M1_matrix.to_numpy(), obs=M1_columns, var=M1_rows)

And then I run the following to try and convert objects to strings:

adata.obs.columns = adata.obs.columns.astype(str)
adata.var.columns = adata.var.columns.astype(str)

adata.var=adata.var.convert_dtypes()
adata.obs=adata.obs.convert_dtypes()

And then when I tried to write it with:

adata.write(path/M1.h5ad)

Then I got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_array(f, key, value, dataset_kwargs)
    184         value = _to_hdf5_vlen_strings(value)
--> 185     f.create_dataset(key, data=value, **dataset_kwargs)
    186 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    148 
--> 149             dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
    150             dset = dataset.Dataset(dsid)

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
     88             dtype = numpy.dtype(dtype)
---> 89         tid = h5t.py_create(dtype, logical=1)
     90 

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_series(group, key, series, dataset_kwargs)
    288     else:
--> 289         write_array(group, key, series.values, dataset_kwargs=dataset_kwargs)
    290 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
    262     for col_name, (_, series) in zip(col_names, df.items()):
--> 263         write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
    264 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_29925/3214377814.py in <module>
----> 1 adata.write('/wynton/group/pollen/arnar/Scanpy/Scanpy/data/Human_M1_data/Human_M1_data.h5ad')

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1903             filename = self.filename
   1904 
-> 1905         _write_h5ad(
   1906             Path(filename),
   1907             self,

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
    109         else:
    110             write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
--> 111         write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
    112         write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)
    113         write_attribute(f, "obsm", adata.obsm, dataset_kwargs=dataset_kwargs)

~/utils/miniconda3/envs/scanpy/lib/python3.9/functools.py in wrapper(*args, **kw)
    875                             '1 positional argument')
    876 
--> 877         return dispatch(args[0].__class__)(*args, **kw)
    878 
    879     funcname = getattr(func, '__name__', 'singledispatch function')

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    128     if key in f:
    129         del f[key]
--> 130     _write_method(type(value))(f, key, value, *args, **kwargs)
    131 
    132 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    210         except Exception as e:
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"
    214                 f"Above error raised while writing key {key!r} of {type(elem)}"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

Thank. you so much for your help!

Dataset [1] https://portal.brain-map.org/atlases-and-data/rnaseq/human-m1-10x

Nov 01 '21 22:11 AB1995UCSF

just fyi, here is the output from logging.print_versions()

anndata     0.7.6
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.4.0
beta_ufunc          NA
binom_ufunc         NA
bottleneck          1.3.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
h5py                3.5.0
igraph              0.9.7
joblib              1.0.1
kiwisolver          1.3.1
leidenalg           0.8.8
llvmlite            0.37.0
matplotlib          3.4.3
mkl                 2.4.0
mpl_toolkits        NA
natsort             7.1.1
nbinom_ufunc        NA
numba               0.54.1
numexpr             2.7.3
numpy               1.20.1
packaging           21.0
pandas              1.3.3
pkg_resources       NA
pyexpat             NA
pyparsing           2.4.7
pytz                2021.3
scipy               1.7.1
six                 1.16.0
sklearn             1.0.1
tables              3.6.1
texttable           1.6.4
threadpoolctl       2.2.0
wcwidth             0.2.5
-----
Python 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]
Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-glibc2.17
32 logical CPU cores, x86_64

Nov 01 '21 22:11 AB1995UCSF

Hey! You can check which columns are causing this issue by running adata.obs.dtypes and adata.var.dtypes and finding the columns that say 'Object'. You can cast those to integer or strings using adata.obs[col_name] = adata.obs[col_name].astype(int) or .astype(str).

Nov 02 '21 08:11 LuckyMD

I think this is related to #504, but is a bit different because I don't think the column giving the error is pd.Int64Dtype, but doesn't have any null values. We could either:

Check these don't have null values, convert to np.int64, write these, and call it a bugfix
Write them as nullable integers once #504 is implemented

@AB1995UCSF, you should be fine to write this if you just don't call .convert_dtypes(). E.g. just:

adata = anndata.AnnData(
    ...,
    obs=pd.read_csv('/path/Human_M1_data/metadata.csv').set_index("sample_name"),
    ...,
)

Nov 09 '21 15:11 ivirshup

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

Jun 23 '23 02:06 github-actions[bot]

@ivirshup any idea what solution we want to go with?

Jun 23 '23 09:06 flying-sheep

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

Aug 25 '23 02:08 github-actions[bot]