spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

module 'spatialdata' has no attribute 'match_sdata_to_table'

Open bellenger-l opened this issue 9 months ago • 5 comments

Hello,

I wanted to filter my Xenium data but I have an error with the function match_sdata_to_table.

Here an reproducible example

```python
import spatialdata as spd
from spatialdata.datasets import blobs

sdata = blobs()
sub_adata = sdata.tables["table"][:10]
sub_sdata = spd.match_sdata_to_table(
sdata=sdata, table_name="table", table=sub_adata, how="right"
)
```

Return the following error :


AttributeError: module 'spatialdata' has no attribute 'match_sdata_to_table'. Did you mean: 'match_element_to_table'?

Desktop (optional):

  • RedHat (8.7)
  • spatialdata 0.3.0

How can I fix this ? Thanks for your time Best Lea

bellenger-l avatar Mar 25 '25 10:03 bellenger-l

Hi Lea, I had this same issue. I learned that spatialdata 0.3.0 is the release version. match_sdata_to_table is currently in the dev version.

In order to fix this I had to install a dev version that had the pull request https://github.com/scverse/spatialdata/pull/627. I chose to go to install the additional pull request https://github.com/scverse/spatialdata/pull/883 just to avoid any other potential issues.

How to install a specific pull request:

If you go on the main page https://github.com/scverse/spatialdata and click "Commits":

Image

and then click the double square to "Copy SHA" [yellow arrow]:

Image

and then construct the pip install command for spatialdata and to specify the specific dev version with the SHA:

pip install git+https://github.com/scverse/spatialdata.git@6e259f0afbf67379ade21e62560910e89f752c68

You may need git as well in order to run that command. If you are on an institution HPC, you could see if git is already installed via module avail git and module load <git output from module avail git>.

Respectfully, Pratik

Pancreas-Pratik avatar Mar 27 '25 20:03 Pancreas-Pratik

Hello @Pancreas-Pratik ,

Thanks a lot for your help !! Your explanation were completely clear. Now the function works, I lost a lot of information (all images, Points, Labels and some shapes) in my subsetted spatialdata object but I think it's another problem, I'll take a closer look.

Best regards, Lea

bellenger-l avatar Mar 31 '25 11:03 bellenger-l

Hello @Pancreas-Pratik ,

Thanks a lot for your help !! Your explanation were completely clear. Now the function works, I lost a lot of information (all images, Points, Labels and some shapes) in my subsetted spatialdata object but I think it's another problem, I'll take a closer look.

Best regards, Lea

You are welcome @bellenger-l I am unsure regarding the the loss of information. I am just learning how to use spatialdata myself, therefore I could not pinpoint where your issue is for the sake of helping you with troubleshooting.

What I do is, I load a fresh spatialdata object via xenium() every time I am analyzing, and then just run through all of the code I had written and saved in my jupyter .ipynb notebook from start to where I had left off. So in your case, if you were to do it this way, the code you used for subset would have to be re-run every time you restart your jupyter notebook kernel. Maybe the way I am doing it is bad practice, since I have read and write the entire xenium /outs/ folder every time I am working on this project, but it feels cleaner to me in a way in terms of reproducibility (knowing that whoever runs the code I am running, it should work for them every time).

I have not figured out the .zarr file saving option yet completely. I think that is how to save progress on changes made to a spatialdata in an intermediate space?

Pancreas-Pratik avatar Mar 31 '25 13:03 Pancreas-Pratik

There is two separate things, I'm afraid...

  • the Zarr store that allow to have your spatialdata object in a different space where some element can be saved throughout your analysis.
  • the loss of information due to the filtering.

For instance, I use the Zarr store because I am testing spatialdata and different spatial transcriptomics packages and some steps are very time consuming and I don't necessarily want to compute everything from the beggining. It's working fine in my opinion except when we want to save the table (anndata within the spatialdata object), we need to save the entire zarr store again.

Regarding the match_sdata_to_table function this is what I see :

  • My spatialdata before filtering sdata :
SpatialData object, with associated Zarr store: /home/blabla/object.zarr
├── Images
│     ├── 'he_image': DataArray[cyx] 
│     ├── 'morphology_focus': DataTree[cyx] 
│     └── 'morphology_mip': DataTree[cyx] 
├── Labels
│     ├── 'cell_labels': DataTree[yx] 
│     └── 'nucleus_labels': DataTree[yx] 
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 10) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (413404, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (413404, 2) (2D shapes)
│     ├── 'nucleus_boundaries': GeoDataFrame shape: (413404, 1) (2D shapes)
│     └── 'tissue_outline': GeoDataFrame shape: (27, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (413404, 426)
with coordinate systems:
    ▸ 'global', with elements:
        he_image (Images), morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes), tissue_outline (Shapes)
  • My subsetted spatialdata object obtained with the following command : sub_sdata = spd.match_sdata_to_table(sdata=sdata, table_name="table", table=sub_adata, how="right")
SpatialData object
├── Shapes
│     └── 'cell_circles': GeoDataFrame shape: (411085, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (411085, 426)
with coordinate systems:
    ▸ 'global', with elements:
        cell_circles (Shapes)

So even when I perform the filtering and affect the result in another variable, I correctly retrieve cells of interest but at the expense of different spatialdata slots. What I don't know is why ? My hypothesis is it's somehow due to the annotation tables that are not linked to the different elements of spatialdata (except cell_circles?)

Did you see the same phenomenon when using match_sdata_to_table function ?

bellenger-l avatar Mar 31 '25 14:03 bellenger-l

Oh. I can help with this.

I was having trouble with the same, which is , essentially, re-inputting the anndata back into the sdata and subsetting cell_boundaries and cell_circles (I imagine the same can be done for nucleus_boundaries). @LucaMarconato actually helped me with this exact issue. His solution is here: https://github.com/scverse/spatialdata/issues/898#issuecomment-2726573957 Below is how I implemented his solution for myself and renamed the object names to your object names.


How to subset the cell_circles and cell_boundaries and re-input subsetted anndata back into the spatialdata (See ***Note below):

# backup sdata first
sdata.write("sdata_backup.zarr")

# subset for cell_circles
sub_sdata = spd.match_sdata_to_table(sdata=sdata, table_name="table", table=sub_adata, how="right")
sdata.shapes['cell_circles']=sub_sdata.shapes['cell_circles']

# repeat for cell_boundaries
sdata["table"].obs["region"] = "cell_boundaries"
sdata.set_table_annotates_spatialelement(table_name="table", region="cell_boundaries")

sub_sdata = spd.match_sdata_to_table(sdata=sdata, table_name="table", table=sub_adata, how="right")
sdata.shapes['cell_boundaries']=sub_sdata.shapes['cell_boundaries']

# and then re-input the anndata
sdata.tables["table"] = sub_adata

# sdata should have cell_circles and cell_boundaries filtered by sub_adata and sub_adata should be re-inputted back into the sdata 
sdata

***Note: If you see @LucaMarconato solution to my issue above, he said the current solution is only a temporary fix: From https://github.com/scverse/spatialdata/issues/898#issuecomment-2726573957:

Two comments:

we are currently improving the ergonomics around these type of operations with a new API called match_sdata_to_table(), merged here

https://github.com/scverse/spatialdata/pull/627 and with a work-in-progress PR called for a new API called filter_table_by_query(), discussed here https://github.com/scverse/spatialdata/pull/894. Also, squidpy will offer APIs similar to the scanpy ones. The implementation for the moment will be separate because first we need to enable the join APIs (used in the functions above), to return a view and not always a copy. This is being worked out in this spatialdata PR here https://github.com/scverse/spatialdata/pull/701. The squidpy PR is this one here: https://github.com/scverse/squidpy/pull/967

Pancreas-Pratik avatar Apr 02 '25 00:04 Pancreas-Pratik

@bellenger-l I recently learned about the value of the Zarr store for backing up data and re-loading it quickly. Tangentially, backing up to the Zarr store solved an issue I just had. Partially, thanks to you mentioning it here recently. I edited my response above:

from:

# backup sdata first
sdata_backup=sdata
# backup sdata first
sdata.write("sdata_backup.zarr")

Pancreas-Pratik avatar Apr 08 '25 23:04 Pancreas-Pratik

Hi Lea, I had this same issue. I learned that spatialdata 0.3.0 is the release version. match_sdata_to_table is currently in the dev version.

The new release is out today! 😊

LucaMarconato avatar Apr 21 '25 19:04 LucaMarconato

@bellenger-l, please can you open one separate issue for the other problems you are reporting?

I give a quick answer here, but please let's follow up in separate GitHub issues.

It's working fine in my opinion except when we want to save the table (anndata within the spatialdata object), we need to save the entire zarr store again.

You can write single elements using write_element(), please let me know if this addresses your problem.

LucaMarconato avatar Apr 21 '25 19:04 LucaMarconato

So even when I perform the filtering and affect the result in another variable, I correctly retrieve cells of interest but at the expense of different spatialdata slots. What I don't know is why ? My hypothesis is it's somehow due to the annotation tables that are not linked to the different elements of spatialdata (except cell_circles?)

Yes, this is by design. All the elements that are not annotated by the table are not included in the filtered element. But you can easily plug them in in-memory by doing something like filetered_sdata['my_image'] = sdata['my_image'] (as shown in the code from @Pancreas-Pratik, thanks for the answer). I hope this helps.

LucaMarconato avatar Apr 21 '25 19:04 LucaMarconato

Hello everyone,

Sorry for the late response, thanks a lot for your help !!

please can you open one separate issue for the other problems you are reporting?

Of course @LucaMarconato , I got carried away with the discussion.

I didn't have much time for the spatial lately but I am back on these analyses so I am going to test the solution. I succeed to change the annotation table so I think my understanding is much better than before.

Closing for now, since my initial issue is resolved !

Thanks again for your time and help @Pancreas-Pratik and @LucaMarconato Best, Lea

bellenger-l avatar Jun 10 '25 09:06 bellenger-l