anndata
anndata copied to clipboard
"ValueError: value.index does not match parent’s axis 0 names" error when trying to read h5ad that was processed through scVI
Hi !
I am new to the Scanpy/Anndata ecosystem, and trying to use scVI for data integration.
I trained a scVI model on an Anndata object, saved the object as h5ad, and now I get an error when trying to read the file.
The error occurs when using scanpy.read_h5ad or scvi.data.read_h5ad, and I found someone else reporting the error in this repo before , so I am posting the issue here. Kindly let me know if I should post this somewhere else.
adipo_all = scvi.data.read_h5ad("/home/yyyyy/analysis/anndata_working/adipo_sn_01112021_trained_v1.h5ad")
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index.
warnings.warn("Transforming to str index.", ImplicitModificationWarning)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/scratch/33545574/ipykernel_5065/2292619582.py in <module>
----> 1 adipo_all = scvi.data.read_h5ad("/home/yyyyy/analysis/anndata_working/adipo_sn_01112021_trained_v1.h5ad")
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
435 _clean_uns(d) # backwards compat
436
--> 437 return AnnData(**d)
438
439
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/site-packages/anndata/_core/anndata.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
320 varp=varp,
321 filename=filename,
--> 322 filemode=filemode,
323 )
324
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/site-packages/anndata/_core/anndata.py in _init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
508
509 # TODO: Think about consequences of making obsm a group in hdf
--> 510 self._obsm = AxisArrays(self, 0, vals=convert_to_dict(obsm))
511 self._varm = AxisArrays(self, 1, vals=convert_to_dict(varm))
512
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in __init__(self, parent, axis, vals)
233 self._data = dict()
234 if vals is not None:
--> 235 self.update(vals)
236
237
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/_collections_abc.py in update(*args, **kwds)
Supraclavicular 839 if isinstance(other, Mapping):
840 for key in other:
--> 841 self[key] = other[key]
842 elif hasattr(other, "keys"):
843 for key in other.keys():
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in __setitem__(self, key, value)
149
150 def __setitem__(self, key: str, value: V):
--> 151 value = self._validate_value(value, key)
152 self._data[key] = value
153
/home/xxxxx/pyenv_custom/py_scanalysis_env/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in _validate_value(self, val, key)
211 # Could probably also re-order index if it’s contained
212 raise ValueError(
--> 213 f"value.index does not match parent’s axis {self.axes[0]} names"
214 )
215 return super()._validate_value(val, key)
ValueError: value.index does not match parent’s axis 0 names
This is the offending anndata object.
AnnData object with n_obs × n_vars = 123472 × 36795
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'nCount_HTO', 'nFeature_HTO', 'HTO_maxID', 'HTO_secondID', 'HTO_margin', 'HTO_classification', 'HTO_classification.global', 'hash.ID', 'nCount_SCT', 'nFeature_SCT', 'batch', 'n_genes', 'annot', 'sample_origin', 'day', 'depot', 'tissue', 'timepoint', 'n_genes_by_counts', 'total_counts', 'total_counts_percent_ribo', 'pct_counts_percent_ribo', '_scvi_batch', '_scvi_labels'
var: 'features', 'spliced_features', 'unspliced_features', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'percent_ribo', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
uns: 'hvg', 'sample_origin_colors', 'depot_colors', '_scvi'
obsm: '_scvi_extra_categoricals', '_scvi_extra_continuous', 'X_scVI'
layers: 'counts', 'spliced', 'unspliced'
Whereas the previous version below can be read just fine. scanpy.pp.highly_variable_genes and sc.pp.calculate_qc_metrics were run followed by model training in scVI to get the above anndata.
AnnData object with n_obs × n_vars = 123472 × 36795
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'nCount_HTO', 'nFeature_HTO', 'HTO_maxID', 'HTO_secondID', 'HTO_margin', 'HTO_classification', 'HTO_classification.global', 'hash.ID', 'nCount_SCT', 'nFeature_SCT', 'batch', 'n_genes', 'annot', 'sample_origin', 'day', 'depot', 'tissue', 'timepoint'
var: 'features', 'spliced_features', 'unspliced_features', 'n_cells'
layers: 'counts', 'spliced', 'unspliced'
Libraries: scanpy==1.8.1 anndata==0.7.6 umap==0.5.1 numpy==1.20.3 scipy==1.7.1 pandas==1.3.3 scikit-learn==0.24.2 statsmodels==0.13.0rc0 python-igraph==0.9.6 pynndescent==0.5.4
I had a similar error and doing the following resolved the error. Make sure all 'obsm' objects that are dataframes have the same index as the 'obs' dataframe.
This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!
Seems like this is solved, and the discussion happens in #311
Please tell us if you need anything