anndata
anndata copied to clipboard
`.write` does not save `None` values
Description
When saving an AnnData object to disk, keys of a dictionary whose value is None
seem not to be saved.
import numpy as np
import scanpy as sc
from anndata import AnnData
adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None})
adata.write('adata.h5ad')
_adata = sc.read('adata.h5ad')
gives
>>> _adata
AnnData object with n_obs × n_vars = 3 × 3
uns: 'key_1'
Did this ever work? I recall thinking about it when I implemented the write_none
function, but was probably going for backwards compat then.
Do you have a suggested way to save these? I think hdf5
may have an appropriate type, but I'm not sure zarr does.
Not super happy/convinced by this but how about saving it as a string 'None'
and then converting it back to None
, when reading the file. Would have to make sure that actual strings 'None'
are not converted to None
.
BTW, this is also an issue if you have None
in one of your columns:
import numpy as np
import scanpy as sc
from anndata import AnnData
adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None, 'key_3': pd.DataFrame({'col_0': ['string', None]})})
# Alternative failure
# adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None}, obs={'col_0': [None]})
adata.write('adata.h5ad')
Traceback
Traceback (most recent call last):
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 270, in write_series
group.create_dataset(
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/h5py/_hl/group.py", line 148, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 140, in make_new_dset
dset_id.write(h5s.ALL, h5s.ALL, data)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings
Above error raised while writing key 'col_0' of <class 'h5py._hl.group.Group'> from /.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1912, in write_h5ad
_write_h5ad(
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 118, in write_h5ad
write_attribute(f, "uns", adata.uns, dataset_kwargs=dataset_kwargs)
File "/miniconda3/envs/anndata_bug/lib/python3.8/functools.py", line 875, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
_write_method(type(value))(f, key, value, *args, **kwargs)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 294, in write_mapping
write_attribute(f, f"{key}/{sub_key}", sub_value, dataset_kwargs=dataset_kwargs)
File "/miniconda3/envs/anndata_bug/lib/python3.8/functools.py", line 875, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
_write_method(type(value))(f, key, value, *args, **kwargs)
File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings
Above error raised while writing key 'col_0' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'uns/key_3' of <class 'h5py._hl.files.File'> from /.
Though it does work for
adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None}, obs={'col_0': ['string', 'string', None]})
I wouldn't like a string None
, but we could encode a null type. E.g. missing_el.attrs["encoding_type"] = "null"
.
For now, I would say the typical way we handle this in scanpy is just adata.uns.get("maybe_none_key", None)
for any parameter that could be None
.
The cases for columns in a dataframe are a bit different, since those have to be values in an array.
obs={'col_0': [None]}
This fails because none of us, numpy, or pandas can infer what type that array is beyond object
.
pd.DataFrame({'col_0': ['string', None]})
We could potentially infer this to a string array, and then add support for nullable string arrays. See #504 and #669. I'm not sure pandas string representation is mature enough yet to do this at the moment.
obs={'col_0': ['string', 'string', None]}
This works since we cast the column to a categorical, which we support null values for.
@WeilerP if you wanted to look into this, I would appreciate some info on how other systems handle this. For instance json
has null
, but I'm not so sure about zarr
, hdf5
, or arrow
.
Just for the sake of documenting this somewhere: I ran into this issue when I used the log1p
function, which as a default writes {"base": None}
to uns
. However after saving and reloading the object, an error was thrown with rank_genes_groups
(code), because it is looking for the base
key which is not present anymore.
I met this issue too. Please refer to https://github.com/aristoteleo/dynamo-release/issues/440
People are running into this in the wild, I’ll see if I can implement this: https://github.com/scverse/scanpy/issues/2497, scverse/scanpy-tutorials#65
@WeilerP if you wanted to look into this, I would appreciate some info on how other systems handle this. For instance
json
hasnull
, but I'm not so sure aboutzarr
,hdf5
, orarrow
.
hdf5 has null attributes and null datasets, zarr doesn’t seem to have anything. #999 seems to work well.