anndata icon indicating copy to clipboard operation
anndata copied to clipboard

`.write` does not save `None` values

Open WeilerP opened this issue 3 years ago • 10 comments

Description

When saving an AnnData object to disk, keys of a dictionary whose value is None seem not to be saved.

import numpy as np
import scanpy as sc
from anndata import AnnData

adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None})
adata.write('adata.h5ad')

_adata = sc.read('adata.h5ad')

gives

>>> _adata
AnnData object with n_obs × n_vars = 3 × 3
    uns: 'key_1'

WeilerP avatar Jan 06 '22 10:01 WeilerP

Did this ever work? I recall thinking about it when I implemented the write_none function, but was probably going for backwards compat then.

Do you have a suggested way to save these? I think hdf5 may have an appropriate type, but I'm not sure zarr does.

ivirshup avatar Jan 10 '22 15:01 ivirshup

Not super happy/convinced by this but how about saving it as a string 'None' and then converting it back to None, when reading the file. Would have to make sure that actual strings 'None' are not converted to None.

WeilerP avatar Jan 11 '22 09:01 WeilerP

BTW, this is also an issue if you have None in one of your columns:

import numpy as np
import scanpy as sc
from anndata import AnnData

adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None, 'key_3': pd.DataFrame({'col_0': ['string', None]})})
# Alternative failure
# adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None}, obs={'col_0': [None]})
adata.write('adata.h5ad')
Traceback
Traceback (most recent call last):
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 270, in write_series
    group.create_dataset(
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/h5py/_hl/group.py", line 148, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 140, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
  File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
  File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
    write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'col_0' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1912, in write_h5ad
    _write_h5ad(
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 118, in write_h5ad
    write_attribute(f, "uns", adata.uns, dataset_kwargs=dataset_kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/functools.py", line 875, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 294, in write_mapping
    write_attribute(f, f"{key}/{sub_key}", sub_value, dataset_kwargs=dataset_kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/functools.py", line 875, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'col_0' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'uns/key_3' of <class 'h5py._hl.files.File'> from /.

Though it does work for

adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None}, obs={'col_0': ['string', 'string', None]})

WeilerP avatar Jan 11 '22 09:01 WeilerP

I wouldn't like a string None, but we could encode a null type. E.g. missing_el.attrs["encoding_type"] = "null".

For now, I would say the typical way we handle this in scanpy is just adata.uns.get("maybe_none_key", None) for any parameter that could be None.


The cases for columns in a dataframe are a bit different, since those have to be values in an array.

obs={'col_0': [None]}

This fails because none of us, numpy, or pandas can infer what type that array is beyond object.

pd.DataFrame({'col_0': ['string', None]})

We could potentially infer this to a string array, and then add support for nullable string arrays. See #504 and #669. I'm not sure pandas string representation is mature enough yet to do this at the moment.

obs={'col_0': ['string', 'string', None]}

This works since we cast the column to a categorical, which we support null values for.

ivirshup avatar Jan 11 '22 10:01 ivirshup

@WeilerP if you wanted to look into this, I would appreciate some info on how other systems handle this. For instance json has null, but I'm not so sure about zarr, hdf5, or arrow.

ivirshup avatar Jan 11 '22 14:01 ivirshup

Just for the sake of documenting this somewhere: I ran into this issue when I used the log1p function, which as a default writes {"base": None} to uns. However after saving and reloading the object, an error was thrown with rank_genes_groups (code), because it is looking for the base key which is not present anymore.

LustigePerson avatar Mar 28 '22 12:03 LustigePerson

I met this issue too. Please refer to https://github.com/aristoteleo/dynamo-release/issues/440

wangjiawen2013 avatar Feb 04 '23 18:02 wangjiawen2013

People are running into this in the wild, I’ll see if I can implement this: https://github.com/scverse/scanpy/issues/2497, scverse/scanpy-tutorials#65

flying-sheep avatar Jun 07 '23 11:06 flying-sheep

@WeilerP if you wanted to look into this, I would appreciate some info on how other systems handle this. For instance json has null, but I'm not so sure about zarr, hdf5, or arrow.

hdf5 has null attributes and null datasets, zarr doesn’t seem to have anything. #999 seems to work well.

flying-sheep avatar Jun 07 '23 13:06 flying-sheep