anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Anndata write/read error

Open gabrielarapozo opened this issue 2 years ago • 4 comments

Hello! I'm having multiple issues to read/write anndata.

Scenario 1: I'm trying to read an anndata file generated by scanpy==1.7.2 with anndata==0.7.6 (and h5py==3.1.0), I'm working with pickle, but there were some old models that I want to read and I can't and get the following error: Obs.: Geistlinger_Tumor_stage in the KeyError: "Unable to open object (object 'Geistlinger_Tumor_stage' doesn't exist)" is a column in adata.obs, when I wrote this object a month ago I didn't receive any warning.

---------------------------------------------------------------------------
  KeyError                                  Traceback (most recent call last)
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
176         try:
  --> 177             return func(elem, *args, **kwargs)
178         except Exception as e:
  
  /data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_dataframe(group)
480     df = pd.DataFrame(
  --> 481         {k: read_series(group[k]) for k in columns},
  482         index=read_series(group[idx_key]),
  
  /data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/h5ad.py in <dictcomp>(.0)
  480     df = pd.DataFrame(
    --> 481         {k: read_series(group[k]) for k in columns},
    482         index=read_series(group[idx_key]),
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    /data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/h5py/_hl/group.py in __getitem__(self, name)
    263         else:
      --> 264             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    265 
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/h5o.pyx in h5py.h5o.open()
    
    KeyError: "Unable to open object (object 'Geistlinger_Tumor_stage' doesn't exist)"
    
    During handling of the above exception, another exception occurred:
      
      AnnDataReadError                          Traceback (most recent call last)
    <ipython-input-4-148b21238bfb> in <module>
      ----> 1 adata = sc.read('/home/gabrielarapozo/macrophages_sc/results/scanpy/version7/second_level_anno/adata_neutrophil.h5ad')
    
    /data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/scanpy/readwrite.py in read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, **kwargs)
    110     filename = Path(filename)  # allow passing strings
    111     if is_valid_filename(filename):
      --> 112         return _read(
        113             filename,
        114             backed=backed,
        
        /data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/scanpy/readwrite.py in _read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, suppress_cache_warning, **kwargs)
        711     if ext in {'h5', 'h5ad'}:
          712         if sheet is None:
          --> 713             return read_h5ad(filename, backed=backed)
        714         else:
          715             logg.debug(f'reading sheet {sheet} from file {filename}')
        
        /data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
        417                 assert False, "unexpected raw format"
        418             elif k in {"obs", "var"}:
          --> 419                 d[k] = read_dataframe(f[k])
        420             else:  # Base case
          421                 d[k] = read_attribute(f[k])
        
        /data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
        181             else:
          182                 parent = _get_parent(elem)
        --> 183                 raise AnnDataReadError(
          184                     f"Above error raised while reading key {elem.name!r} of "
          185                     f"type {type(elem)} from {parent}."
          
          AnnDataReadError: Above error raised while reading key '/obs' of type <class 'h5py._hl.group.Group'> from /.

Scenario 2: I'm trying to write an anndata file generated by scanpy==1.7.2 with anndata==0.7.6 (and h5py==3.4.0) in other env, and I get the following error:

Traceback (most recent call last):
  File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 274, in write_series
**dataset_kwargs,
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/h5py/_hl/group.py", line 149, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 140, in make_new_dset
dset_id.write(h5s.ALL, h5s.ALL, data)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
    write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
  File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 216, in func_wrapper
    ) from e
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'dataset' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
  File "/home/giomaklouf/sc_ovc_tme/bin/masters.project/secondLevelAnno#4/06-HGVinSubsets/06-scVI_in_mast.py", line 176, in <module>
  adata.write(out_path + 'adata_mast.h5ad')
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_core/anndata.py", line 1911, in write_h5ad
as_dense=as_dense,
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/functools.py", line 840, in wrapper
return dispatch(args[0]._class_)(*args, **kw)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
_write_method(type(value))(f, key, value, *args, **kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 216, in func_wrapper
) from e
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'dataset' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

These are just some examples, but I had other problems as well. I have been using pickle saving, or saving the obs outside and then adding it to the object, but this is not very practical and I was wondering if there was a way out.

Thanks again!

gabrielarapozo avatar Sep 22 '21 13:09 gabrielarapozo

I am also experiencing a variant of Scenario 2, I think this might actually be a scanpy bug but I'm not entirely sure

/opt/conda/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'cell_type' as categorical
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 185, in write_array
    f.create_dataset(key, data=value, **dataset_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/h5py/_hl/group.py", line 149, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/opt/conda/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 137, in make_new_dset
    dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 87, in h5py.h5d.create
ValueError: Unable to create dataset (name already exists)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 289, in write_series
    write_array(group, key, series.values, dataset_kwargs=dataset_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
ValueError: Unable to create dataset (name already exists)

Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
    write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
ValueError: Unable to create dataset (name already exists)

Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/build/generate_clusters_for_normalization.py", line 19, in <module>
    adata.write("temp_clustered_for_scran.h5ad", compression='gzip')#, compression_opts=1)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1905, in write_h5ad
    _write_h5ad(
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
    write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
  File "/opt/conda/lib/python3.8/functools.py", line 875, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
ValueError: Unable to create dataset (name already exists)

Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

The commands were fairly simple:

adata = sc.read(sys.argv[1])

sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e6)
sc.pp.log1p(adata)
print("running pca")
sc.pp.pca(adata, n_comps=15)
print("computing neighbors")
sc.pp.neighbors(adata)
print("running louvain clustering")
sc.tl.louvain(adata, key_added='groups', resolution=0.5)

adata.write("temp_clustered_for_scran.h5ad", compression='gzip')#, compression_opts=1)

The obvious thing I noticed was that the dataset appears to already have some of the columns (for example n_counts) and anndata set that column as the index for some reason and then the counting command created a second n_counts column with the same name instead of overwriting the old one since it couldn't see it as it was set as the index?

image

ACastanza avatar Sep 23 '21 00:09 ACastanza

@gabrielarapozo, for your first issue, would you be able to provide examples of the files you're having trouble reading?

For the second, do you know what the values are that are causing the error? My first guess is this would be a numpy array with an object dtype. These tend to cause problems, mostly when the data stored in them isn't strings.

@ACastanza, I think the issue you're seeing is similar to https://github.com/theislab/anndata/issues/452

ivirshup avatar Oct 20 '21 17:10 ivirshup

@ivirshup yes, that is the issue, the cause seems to have been that .obs was written backwards somehow. Ultimately we were able to resolve this by opening the anndata file and manually restructuring the offending columns (i.e., we took .obs, moved the n_counts off the index, and flipped it so the index became the index). We were using an anndata file written from a (very) old version so I suspect that this is the result of an incompatibility between a historical version and the current one, or the old pipeline was creating something incorrectly.

ACastanza avatar Oct 20 '21 17:10 ACastanza

@ivirshup isnt that something that can be solved? @ACastanza any work arounds? Hard to edit a file you cant open

dsm-72 avatar Jun 30 '22 15:06 dsm-72

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

github-actions[bot] avatar Sep 07 '23 02:09 github-actions[bot]

I'm going to close this as it seems a unclear what the actual issue is, and other topics that have been brought up seem like they are covered by existing issues.

ivirshup avatar Sep 07 '23 13:09 ivirshup