How to best read proseg output in sdata format?
Hello and thanks for this great tool!
I am using proseg 2.0.4. I have ran proseg and subsequently run proseg-to-baysor and xeniumranger import-segmentation to generate a xenium explorer compatible output.
I tried to use the xenium function from spatialdata-io to read in the xeniumranger formatted output but it gave me an error that I will paste below.
Would you be able to advise on this error or would you have a recommended code snippet to read the proseg output into an sdata format?
Cheers, Anastasia
KeyError Traceback (most recent call last)
Cell In[18], line 9
7 for proseg_path in proseg_io:
8 print(str(proseg_path)+'/proseg-xenium/outs/')
----> 9 sdata = xenium(str(proseg_path)+'/proseg-xenium/outs/', n_jobs=30)
10 #zarr_path = str(proseg_path)+'/proseg-xenium/outs/' + ".zarr"
11 #sdata.write(zarr_path)
File /lib/python3.11/site-packages/spatialdata_io/_utils.py:47, in deprecation_alias.
File /lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:228, in xenium(path, cells_boundaries, nucleus_boundaries, cells_as_circles, cells_labels, nucleus_labels, transcripts, morphology_mip, morphology_focus, aligned_images, cells_table, n_jobs, imread_kwargs, image_models_kwargs, labels_models_kwargs) 219 labels["nucleus_labels"], _ = _get_labels_and_indices_mapping( 220 path, 221 XeniumKeys.CELLS_ZARR, (...) 225 labels_models_kwargs=labels_models_kwargs, 226 ) 227 if cells_labels: --> 228 labels["cell_labels"], cell_labels_indices_mapping = _get_labels_and_indices_mapping( 229 path, 230 XeniumKeys.CELLS_ZARR, 231 specs, 232 mask_index=1, 233 labels_name="cell_labels", 234 labels_models_kwargs=labels_models_kwargs, 235 ) 236 if cell_labels_indices_mapping is not None and table is not None: 237 if not pd.DataFrame.equals(cell_labels_indices_mapping["cell_id"], table.obs[str(XeniumKeys.CELL_ID)]):
File /lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:420, in _get_labels_and_indices_mapping(path, file, specs, mask_index, labels_name, labels_models_kwargs) 416 zip_ref.extractall(tmpdir) 418 with zarr.open(str(tmpdir), mode="r") as z: 419 # get the labels --> 420 masks = z["masks"][f"{mask_index}"][...] 421 labels = Labels2DModel.parse( 422 masks, dims=("y", "x"), transformations={"global": Identity()}, **labels_models_kwargs 423 ) 425 # build the matching table
File /lib/python3.11/site-packages/zarr/hierarchy.py:511, in Group.getitem(self, item) 509 raise KeyError(item) 510 else: --> 511 raise KeyError(item)
KeyError: '1'
Actually adding the following to my xenium function resolved the issue.
sdata = xenium(str(proseg_path)+'/proseg-xenium/outs/', n_jobs=30, cells_labels=False, nucleus_boundaries=False)
I would still be interested to know if you have a suggested way to read in the proseg output in an sdata format.
Thanks again! Anastasia
This is a spatialdata-io issue (I just created an issue, see the mention above), so they'll have to add support for this. In the mean time, the best you can do I suspect is what you've done already (i.e. turn off reading boundaries).
@kousaa, I used a package called SOPA (https://gustaveroussy.github.io/sopa/), which provides a convenient way of running multiple cell segmentation algorithms. The package is based on spatialdata standards, so the output is an sdata object. It might be worth it for you to give it a try.