problem creating anndata file from MERSCOPE vpt output
Hi,
I'm trying to create a spatialdata object of a MERSCOPE experiment from the vpt output file but there is an index mismatch error when generating the anndata file
Here is the error message:
allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/github_projects/spatialdata/src/spatialdata/models/models.py:620: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if is_categorical_dtype(data[c]) and not data[c].cat.known: /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/anndata/_core/anndata.py:183: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning)
ValueError Traceback (most recent call last) Cell In[17], line 1 ----> 1 sdata = sdio.merscope("/allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/test_data/merfish_output/202202221441_60988207_VMSC01001/region_0/", 2 vpt_outputs="/allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/test_data/merfish_output/202202221441_60988207_VMSC01001/region_0/cellpose_cyto2_nuclei/")
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/github_projects/spatialdata-io/src/spatialdata_io/readers/merscope.py:204, in merscope(path, vpt_outputs, z_layers, region_name, slide_name, imread_kwargs, image_models_kwargs) 201 obs = pd.read_csv(obs_path, index_col=0, dtype={MerscopeKeys.METADATA_CELL_KEY: str}) 203 is_gene = ~data.columns.str.lower().str.contains("blank") --> 204 adata = anndata.AnnData(data.loc[:, is_gene], dtype=data.values.dtype, obs=obs) 206 adata.obsm["blank"] = data.loc[:, ~is_gene] # blank fields are excluded from adata.X 207 adata.obsm["spatial"] = adata.obs[[MerscopeKeys.CELL_X, MerscopeKeys.CELL_Y]].values
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/anndata/_core/anndata.py:362, in AnnData.init(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx) 360 self._init_as_view(X, oidx, vidx) 361 else: --> 362 self._init_as_actual( 363 X=X, 364 obs=obs, 365 var=var, 366 uns=uns, 367 obsm=obsm, 368 varm=varm, 369 raw=raw, 370 layers=layers, 371 dtype=dtype, 372 shape=shape, 373 obsp=obsp, 374 varp=varp, 375 filename=filename, 376 filemode=filemode, 377 )
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/anndata/_core/anndata.py:558, in AnnData._init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode) 556 attr.index = idx 557 elif not idx.equals(attr.index): --> 558 raise ValueError(f"Index of {attr_name} must match {x_name} of X.") 560 # unstructured annotations 561 self.uns = uns or OrderedDict()
ValueError: Index of obs must match index of X.
Hi, as described here https://github.com/scverse/spatialdata-io/issues/89, could you please provide a small dataset to reproduce this bug? Thank you.
Hi @LucaMarconato, yes the data is available on BIL (https://doi.brainimagelibrary.org/doi/10.35077/g.610).
But in the meantime I did find a fix for that problem by adding a line to make sure the index of the cell by gene table and the metadata match data = pd.read_csv(count_path, index_col=0, dtype={MerscopeKeys.COUNTS_CELL_KEY: str}) obs = pd.read_csv(obs_path, index_col=0, dtype={MerscopeKeys.METADATA_CELL_KEY: str}) obs = obs.reindex(data.index)
I can do a PR later
Hi thanks, I answered you in the linked issue.
Following up in https://github.com/scverse/spatialdata-io/issues/89.