anndata
anndata copied to clipboard
Anndata write/read error
Hello! I'm having multiple issues to read/write anndata.
Scenario 1: I'm trying to read an anndata file generated by scanpy==1.7.2 with anndata==0.7.6 (and h5py==3.1.0), I'm working with pickle, but there were some old models that I want to read and I can't and get the following error: Obs.: Geistlinger_Tumor_stage in the KeyError: "Unable to open object (object 'Geistlinger_Tumor_stage' doesn't exist)" is a column in adata.obs, when I wrote this object a month ago I didn't receive any warning.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
176 try:
--> 177 return func(elem, *args, **kwargs)
178 except Exception as e:
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_dataframe(group)
480 df = pd.DataFrame(
--> 481 {k: read_series(group[k]) for k in columns},
482 index=read_series(group[idx_key]),
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/h5ad.py in <dictcomp>(.0)
480 df = pd.DataFrame(
--> 481 {k: read_series(group[k]) for k in columns},
482 index=read_series(group[idx_key]),
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/h5py/_hl/group.py in __getitem__(self, name)
263 else:
--> 264 oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
265
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5o.pyx in h5py.h5o.open()
KeyError: "Unable to open object (object 'Geistlinger_Tumor_stage' doesn't exist)"
During handling of the above exception, another exception occurred:
AnnDataReadError Traceback (most recent call last)
<ipython-input-4-148b21238bfb> in <module>
----> 1 adata = sc.read('/home/gabrielarapozo/macrophages_sc/results/scanpy/version7/second_level_anno/adata_neutrophil.h5ad')
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/scanpy/readwrite.py in read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, **kwargs)
110 filename = Path(filename) # allow passing strings
111 if is_valid_filename(filename):
--> 112 return _read(
113 filename,
114 backed=backed,
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/scanpy/readwrite.py in _read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, suppress_cache_warning, **kwargs)
711 if ext in {'h5', 'h5ad'}:
712 if sheet is None:
--> 713 return read_h5ad(filename, backed=backed)
714 else:
715 logg.debug(f'reading sheet {sheet} from file {filename}')
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
417 assert False, "unexpected raw format"
418 elif k in {"obs", "var"}:
--> 419 d[k] = read_dataframe(f[k])
420 else: # Base case
421 d[k] = read_attribute(f[k])
/data04/projects04/lbbc_members/lib/conda_envs/sc_breast/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
181 else:
182 parent = _get_parent(elem)
--> 183 raise AnnDataReadError(
184 f"Above error raised while reading key {elem.name!r} of "
185 f"type {type(elem)} from {parent}."
AnnDataReadError: Above error raised while reading key '/obs' of type <class 'h5py._hl.group.Group'> from /.
Scenario 2: I'm trying to write an anndata file generated by scanpy==1.7.2 with anndata==0.7.6 (and h5py==3.4.0) in other env, and I get the following error:
Traceback (most recent call last):
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 274, in write_series
**dataset_kwargs,
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/h5py/_hl/group.py", line 149, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 140, in make_new_dset
dset_id.write(h5s.ALL, h5s.ALL, data)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 216, in func_wrapper
) from e
TypeError: Can't implicitly convert non-string objects to strings
Above error raised while writing key 'dataset' of <class 'h5py._hl.group.Group'> from /.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/giomaklouf/sc_ovc_tme/bin/masters.project/secondLevelAnno#4/06-HGVinSubsets/06-scVI_in_mast.py", line 176, in <module>
adata.write(out_path + 'adata_mast.h5ad')
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_core/anndata.py", line 1911, in write_h5ad
as_dense=as_dense,
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/functools.py", line 840, in wrapper
return dispatch(args[0]._class_)(*args, **kw)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
_write_method(type(value))(f, key, value, *args, **kwargs)
File "/data04/projects04/lbbc_members/lib/conda_envs/scvi-env/lib/python3.7/site-packages/anndata/_io/utils.py", line 216, in func_wrapper
) from e
TypeError: Can't implicitly convert non-string objects to strings
Above error raised while writing key 'dataset' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.
These are just some examples, but I had other problems as well. I have been using pickle saving, or saving the obs outside and then adding it to the object, but this is not very practical and I was wondering if there was a way out.
Thanks again!
I am also experiencing a variant of Scenario 2, I think this might actually be a scanpy bug but I'm not entirely sure
/opt/conda/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'cell_type' as categorical
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 185, in write_array
f.create_dataset(key, data=value, **dataset_kwargs)
File "/opt/conda/lib/python3.8/site-packages/h5py/_hl/group.py", line 149, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/opt/conda/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 137, in make_new_dset
dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 87, in h5py.h5d.create
ValueError: Unable to create dataset (name already exists)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 289, in write_series
write_array(group, key, series.values, dataset_kwargs=dataset_kwargs)
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
raise type(e)(
ValueError: Unable to create dataset (name already exists)
Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
raise type(e)(
ValueError: Unable to create dataset (name already exists)
Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/build/generate_clusters_for_normalization.py", line 19, in <module>
adata.write("temp_clustered_for_scran.h5ad", compression='gzip')#, compression_opts=1)
File "/opt/conda/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1905, in write_h5ad
_write_h5ad(
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
File "/opt/conda/lib/python3.8/functools.py", line 875, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
_write_method(type(value))(f, key, value, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
raise type(e)(
ValueError: Unable to create dataset (name already exists)
Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'n_counts' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.
The commands were fairly simple:
adata = sc.read(sys.argv[1])
sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e6)
sc.pp.log1p(adata)
print("running pca")
sc.pp.pca(adata, n_comps=15)
print("computing neighbors")
sc.pp.neighbors(adata)
print("running louvain clustering")
sc.tl.louvain(adata, key_added='groups', resolution=0.5)
adata.write("temp_clustered_for_scran.h5ad", compression='gzip')#, compression_opts=1)
The obvious thing I noticed was that the dataset appears to already have some of the columns (for example n_counts) and anndata set that column as the index for some reason and then the counting command created a second n_counts column with the same name instead of overwriting the old one since it couldn't see it as it was set as the index?
@gabrielarapozo, for your first issue, would you be able to provide examples of the files you're having trouble reading?
For the second, do you know what the values are that are causing the error? My first guess is this would be a numpy array with an object
dtype. These tend to cause problems, mostly when the data stored in them isn't strings.
@ACastanza, I think the issue you're seeing is similar to https://github.com/theislab/anndata/issues/452
@ivirshup yes, that is the issue, the cause seems to have been that .obs was written backwards somehow. Ultimately we were able to resolve this by opening the anndata file and manually restructuring the offending columns (i.e., we took .obs, moved the n_counts off the index, and flipped it so the index became the index). We were using an anndata file written from a (very) old version so I suspect that this is the result of an incompatibility between a historical version and the current one, or the old pipeline was creating something incorrectly.
@ivirshup isnt that something that can be solved? @ACastanza any work arounds? Hard to edit a file you cant open
This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!
I'm going to close this as it seems a unclear what the actual issue is, and other topics that have been brought up seem like they are covered by existing issues.