spatialdata-io Verify compatibility with Xenium Onboard Analysis 3.0

A new version of XOA has been released, which supports the new Xenium Prime 5K panels; from the changelog I believe that no modifications are required to for the xenium() reader to support the new format.

[ ] Still we should verify this.
[x] Also, we should add the new small test datasets to the CI.

Jul 31 '24 13:07 LucaMarconato

The small dataset "Xenium_Prime_MultiCellSeg_Mouse_Ileum_tiny" seems to be invalid. I added the other one in the GitHub workflow that prepares the test data.

[x] tests need to be added.

Dec 10 '24 15:12 LucaMarconato

Hi there, I'd like to chime in on this - I'm working with a dataset from XOA v3.2.1.2

I'm noticing that the cell_id field in my table does not align with the index in my cell_boundaries, also the size of my cell_boundaries is modestly different than my cell_labels.

I'm running latest versions of spatialdata and spatial data-io

Feb 10 '25 18:02 benemead

Hi @benemead thanks for reaching out. Are you working on a public dataset/can you reproduce on a public dataset? Happy to assist.

Feb 10 '25 19:02 LucaMarconato

@timtreis are you using the same XOA version?

Feb 10 '25 19:02 LucaMarconato

Hi @LucaMarconato, appreciate the prompt reply!

Unfortunately not public, and our data is coming from a 3rd party who runs the instrument.

If there's a way to abbreviate or anonymize my current data, I'd gladly share (also xenium slides are massive - this one's clocking in it ~800k labels).

From inspection of some of the outputs in the xenium dir (cell .csv.gz files) I can see that what's been loaded for 'cell_id' does not match - rather looks like the hex conversion is starting from 0 and increasing one by one.

Happy to take a pass at it myself and report back if you all had pointers - I'm pretty unfamiliar with this codebase.

Feb 11 '25 11:02 benemead

I checked the changelog for XOA 3.2.1 https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/release-notes/release-notes-for-xoa and I don't think the problem is tied to that version. It could be instead due to fact that from XOA 3.0.0 one could have cells with no nuclei, or cells with multiple nuclei.

A way to share anonymized data could be the following:

extract indices from cell labels and nuclei using spatialdata.get_element_instances(), save as a series into 2 .csvfiles
save indices of cell boundaries and nucleus boundaries into 2 .csv files
given _, region_key, instance_key = spatialdata.models.get_table_key(sdata['table']), extract the region_key and instance_key columns from sdata['table'].obs into a .csv file. If you could share the above it would be great.

Also, you could check what you can share from the file cells.zarr.zip. This contains some important metadata used to link the nuclei with the table.

Finally, please note that for Xenium data, all the code for parsing is contained in a single file (<800 lines of code) https://github.com/scverse/spatialdata-io/blob/main/src/spatialdata_io/readers/xenium.py, so if you could try to debug it and share more information on where you get the error, or which value the variables have when you get the error, this could help a lot!

Feb 11 '25 12:02 LucaMarconato

Hi @LucaMarconato - apologies for my slow reply - have dug into the issue a bit more, and have attached the .obs columns as table_metadata.csv, the cell_labels element instances as cell_labels.csv, and the cell_boundaries index as cells_boundaries.csv

What you'll see is that the cell_id (from Xenium) is not preserved in the cell_labels, however it is present in the cell_boundaries - based on my review of the code you referenced above it should be converting the hashed Xenium cell_id to that alpha string? Maybe I'm misunderstanding?

cell_labels.csv cell_boundaries.csv table_metadata.csv

Feb 26 '25 19:02 benemead

Actually - I think I may have found the issue - in line 228, cell_labels_indices_mapping is defined, and below there is a conditional test to see if the mapping matches the cell_ids, but then the mapping (AFAIK) is not used again...

NVM - I see - the cell_labels is meant to just be an int - and presumably I need to use the mapping between cell_labels and cell_id in table.obs to map between the two?

Feb 26 '25 19:02 benemead

Exactly. cell_labels matches the integer values for the pixels in the labels element. Instead cell_id is the index of the cells that is used to compute the hex representation. Please let me know if with this information the problem is still open or if it was due to the ambiguity now explained.

Mar 16 '25 15:03 LucaMarconato

[ ] We need also to parse the morphology.ome.tif image and check that all is good with the Z-stack. Example dataset: https://www.10xgenomics.com/datasets/xenium-prime-ffpe-human-ovarian-cancer (3.0.0). Reported by @BioinfoTongLI

Mar 21 '25 14:03 LucaMarconato

just some supplementary info for the morphology.ome.tif is a multiZ tiff file with dimension order Z, Y, X. DAPI channel-only. Not any other channels.

Mar 21 '25 14:03 BioinfoTongLI

spatialdata-io spatialdata-io copied to clipboard

Verify compatibility with Xenium Onboard Analysis 3.0

spatialdata-io
spatialdata-io copied to clipboard