spatialdata-plot
spatialdata-plot copied to clipboard
Problem with indices vs positional indices in shapes
I have just noticed that the squidpy notebook can't make the last plot; it should be a quick fix. The latest version of the notebook (gonna be merge in main soon) is this one: https://github.com/scverse/spatialdata-notebooks/blob/fix/remove_napari/notebooks/examples/squidpy_integration.ipynb
The dataset used in the notebook is the xenium rep 1 data, it should be unchanged from the one we have downloaded and the S3 version should be up-to-date (I will re-enable the upload functions today/tomorrow, so for sure it will be up-to-date soon).
The problem seem to be due to the use of positional indices where ".loc" indices should have been used. Please see the screenshot: the cell_id and gdf.index columns match, but it seems that at some point either a positional index for the table, either the table.obs.index has been used.
Faulty line:
sdata.pl.render_shapes(element="cell_circles", color="leiden").pl.show()
Traceback
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[7], line 1
----> 1 sdata.pl.render_shapes(element="cell_circles", color="leiden").pl.show()
File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/basic.py:673, in PlotAccessor.show(self, coordinate_systems, legend_fontsize, legend_fontweight, legend_loc, legend_fontoutline, na_in_legend, colorbar, wspace, hspace, ncols, frameon, figsize, dpi, fig, title, share_extent, pad_extent, ax, return_ax, save)
662 _render_images(
663 sdata=sdata,
664 render_params=params,
(...)
670 # extent=extent[cs],
671 )
672 elif cmd == "render_shapes" and cs_contents.query(f"cs == '{cs}'")["has_shapes"][0]:
--> 673 _render_shapes(
674 sdata=sdata,
675 render_params=params,
676 coordinate_system=cs,
677 ax=ax,
678 fig_params=fig_params,
679 scalebar_params=scalebar_params,
680 legend_params=legend_params,
681 )
683 elif cmd == "render_points" and cs_contents.query(f"cs == '{cs}'")["has_points"][0]:
684 _render_points(
685 sdata=sdata,
686 render_params=params,
(...)
691 legend_params=legend_params,
692 )
File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/render.py:121, in _render_shapes(sdata, render_params, coordinate_system, ax, fig_params, scalebar_params, legend_params)
118 color_vector = color_vector[mask]
119 shapes = gpd.GeoDataFrame(shapes, geometry="geometry")
--> 121 _cax = _get_collection_shape(
122 shapes=shapes,
123 s=render_params.scale,
124 c=color_vector,
125 render_params=render_params,
126 rasterized=sc_settings._vector_friendly,
127 cmap=render_params.cmap_params.cmap,
128 norm=norm,
129 fill_alpha=render_params.fill_alpha,
130 outline_alpha=render_params.outline_alpha
131 # **kwargs,
132 )
134 # Sets the limits of the colorbar to the values instead of [0, 1]
135 if not norm and not values_are_categorical:
File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/utils.py:248, in _get_collection_shape(shapes, c, s, norm, render_params, fill_alpha, outline_alpha, **kwargs)
246 scaled_coords = [(centroid + (np.array(coord) - centroid) * s).tolist() for coord in geom.exterior.coords]
247 row["geometry"] = mplp.Polygon(scaled_coords, closed=True)
--> 248 assign_fill_and_outline_to_row(shapes, fill_c, outline_c, row, idx)
249 rows.append(row)
251 elif geom.geom_type == "MultiPolygon":
252 # mp = _make_patch_from_multipolygon(geom)
File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/utils.py:235, in _get_collection_shape.<locals>.assign_fill_and_outline_to_row(shapes, fill_c, outline_c, row, idx)
233 row["outline_c"] = outline_c
234 else:
--> 235 row["fill_c"] = fill_c[idx]
236 row["outline_c"] = outline_c[idx]
IndexError: index 167780 is out of bounds for axis 0 with size 167780
I'm running into the same issue
Hi @timtreis I just noticed that this bug is still open. Do you have any timeline for a fix? Thanks 😊
Here is how to reproduce the problem. When the table has less rows than the shapes, the problem appears.
import matplotlib.pyplot as plt
from spatialdata.datasets import blobs_annotating_element
import spatialdata_plot
sdata = blobs_annotating_element("blobs_circles")
sdata['table'] = sdata['table'][:3]
sdata.pl.render_shapes("blobs_circles", color="instance_id").pl.show()
plt.show()
The behavior in napari, that would be expected from the user also here, is that the plot should be possible independently of which rows are present in the table and the shapes (it could be that the table has some rows in common, some missing and some extra; and no guarantee is expected from the order of the rows).
In napari shapes that are not present in the table are plotted in gray, while rows that are present in the table but not in the shapes are simply ignored. The order is matched via the join_spatialelement_table() API, which made the implementation easy.
The bug is related to this https://github.com/scverse/spatialdata-plot/issues/178 which if I remember well Sonja had fixed (and tests were added), I am looking for her PR. I think that plot can be made with any order so maybe the only uncovered cases are when there is a different number of rows between shapes and table. Using join_spatialelement_table() would fi all of this.
Here are the PRs from Sonja https://github.com/scverse/spatialdata-plot/pull/163/files and https://github.com/scverse/spatialdata-plot/pull/177/files.
So it seems that:
- plotting shapes when the table has extra rows is fine
- the tests https://github.com/scverse/spatialdata-plot/blob/87801ac5cf7e6be067a4acf0a5c1ab1a7b2cb26a/tests/pl/test_render_shapes.py#L159 and https://github.com/scverse/spatialdata-plot/blob/87801ac5cf7e6be067a4acf0a5c1ab1a7b2cb26a/tests/pl/test_render_shapes.py#L179 should be already covering the case in which the indices mismatch.
join_spatialelement_tableis not used here, probably because it was not available at that time. Here is how it is used innapari-spatialdata: https://github.com/scverse/napari-spatialdata/blob/b50df555d9c0e1335c99afc27bc7b63bf0324934/src/napari_spatialdata/_view.py#L404. If the data to plot is a subset, instead ofjoin_spatialelement_table, the approach is slightly different (it calls an internal function instead function that computes the join even when the table is not part of aSpatialDataobject, as it happens for subset tables) https://github.com/scverse/napari-spatialdata/blob/b50df555d9c0e1335c99afc27bc7b63bf0324934/src/napari_spatialdata/_viewer.py#L672.
Practically, to address the issue I suggest to proceed as follows:
- [ ] mimic the use of the join APIs as done in
napari-spatialdata - [ ] revisit the tests above to cover also the case in which the table has less rows than the shapes. Probably the best is too simultaneously do all the following, which is the most general case:
- remove some rows from the table
- remove some different rows from the shapes
- shuffle both the table and shapes
Related bug, reported in https://github.com/scverse/spatialdata-plot/issues/357. Using groups leads to a filtering of the table and triggers this bug due to a table/shapes mismatch (that is expected).