spatialdata-plot icon indicating copy to clipboard operation
spatialdata-plot copied to clipboard

Problem with indices vs positional indices in shapes

Open LucaMarconato opened this issue 2 years ago • 6 comments

I have just noticed that the squidpy notebook can't make the last plot; it should be a quick fix. The latest version of the notebook (gonna be merge in main soon) is this one: https://github.com/scverse/spatialdata-notebooks/blob/fix/remove_napari/notebooks/examples/squidpy_integration.ipynb

The dataset used in the notebook is the xenium rep 1 data, it should be unchanged from the one we have downloaded and the S3 version should be up-to-date (I will re-enable the upload functions today/tomorrow, so for sure it will be up-to-date soon).

The problem seem to be due to the use of positional indices where ".loc" indices should have been used. Please see the screenshot: the cell_id and gdf.index columns match, but it seems that at some point either a positional index for the table, either the table.obs.index has been used.

image

Faulty line:

sdata.pl.render_shapes(element="cell_circles", color="leiden").pl.show()

Traceback

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 sdata.pl.render_shapes(element="cell_circles", color="leiden").pl.show()

File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/basic.py:673, in PlotAccessor.show(self, coordinate_systems, legend_fontsize, legend_fontweight, legend_loc, legend_fontoutline, na_in_legend, colorbar, wspace, hspace, ncols, frameon, figsize, dpi, fig, title, share_extent, pad_extent, ax, return_ax, save)
    662     _render_images(
    663         sdata=sdata,
    664         render_params=params,
   (...)
    670         # extent=extent[cs],
    671     )
    672 elif cmd == "render_shapes" and cs_contents.query(f"cs == '{cs}'")["has_shapes"][0]:
--> 673     _render_shapes(
    674         sdata=sdata,
    675         render_params=params,
    676         coordinate_system=cs,
    677         ax=ax,
    678         fig_params=fig_params,
    679         scalebar_params=scalebar_params,
    680         legend_params=legend_params,
    681     )
    683 elif cmd == "render_points" and cs_contents.query(f"cs == '{cs}'")["has_points"][0]:
    684     _render_points(
    685         sdata=sdata,
    686         render_params=params,
   (...)
    691         legend_params=legend_params,
    692     )

File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/render.py:121, in _render_shapes(sdata, render_params, coordinate_system, ax, fig_params, scalebar_params, legend_params)
    118     color_vector = color_vector[mask]
    119 shapes = gpd.GeoDataFrame(shapes, geometry="geometry")
--> 121 _cax = _get_collection_shape(
    122     shapes=shapes,
    123     s=render_params.scale,
    124     c=color_vector,
    125     render_params=render_params,
    126     rasterized=sc_settings._vector_friendly,
    127     cmap=render_params.cmap_params.cmap,
    128     norm=norm,
    129     fill_alpha=render_params.fill_alpha,
    130     outline_alpha=render_params.outline_alpha
    131     # **kwargs,
    132 )
    134 # Sets the limits of the colorbar to the values instead of [0, 1]
    135 if not norm and not values_are_categorical:

File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/utils.py:248, in _get_collection_shape(shapes, c, s, norm, render_params, fill_alpha, outline_alpha, **kwargs)
    246     scaled_coords = [(centroid + (np.array(coord) - centroid) * s).tolist() for coord in geom.exterior.coords]
    247     row["geometry"] = mplp.Polygon(scaled_coords, closed=True)
--> 248     assign_fill_and_outline_to_row(shapes, fill_c, outline_c, row, idx)
    249     rows.append(row)
    251 elif geom.geom_type == "MultiPolygon":
    252     # mp = _make_patch_from_multipolygon(geom)

File ~/embl/projects/basel/spatialdata-plot/src/spatialdata_plot/pl/utils.py:235, in _get_collection_shape.<locals>.assign_fill_and_outline_to_row(shapes, fill_c, outline_c, row, idx)
    233     row["outline_c"] = outline_c
    234 else:
--> 235     row["fill_c"] = fill_c[idx]
    236     row["outline_c"] = outline_c[idx]

IndexError: index 167780 is out of bounds for axis 0 with size 167780

LucaMarconato avatar Oct 09 '23 10:10 LucaMarconato

I'm running into the same issue

grst avatar Apr 09 '24 12:04 grst

Hi @timtreis I just noticed that this bug is still open. Do you have any timeline for a fix? Thanks 😊

LucaMarconato avatar Aug 06 '24 16:08 LucaMarconato

Here is how to reproduce the problem. When the table has less rows than the shapes, the problem appears.

import matplotlib.pyplot as plt
from spatialdata.datasets import blobs_annotating_element
import spatialdata_plot

sdata = blobs_annotating_element("blobs_circles")
sdata['table'] = sdata['table'][:3]

sdata.pl.render_shapes("blobs_circles", color="instance_id").pl.show()
plt.show()

The behavior in napari, that would be expected from the user also here, is that the plot should be possible independently of which rows are present in the table and the shapes (it could be that the table has some rows in common, some missing and some extra; and no guarantee is expected from the order of the rows).

In napari shapes that are not present in the table are plotted in gray, while rows that are present in the table but not in the shapes are simply ignored. The order is matched via the join_spatialelement_table() API, which made the implementation easy.

LucaMarconato avatar Aug 15 '24 16:08 LucaMarconato

The bug is related to this https://github.com/scverse/spatialdata-plot/issues/178 which if I remember well Sonja had fixed (and tests were added), I am looking for her PR. I think that plot can be made with any order so maybe the only uncovered cases are when there is a different number of rows between shapes and table. Using join_spatialelement_table() would fi all of this.

LucaMarconato avatar Aug 15 '24 16:08 LucaMarconato

Here are the PRs from Sonja https://github.com/scverse/spatialdata-plot/pull/163/files and https://github.com/scverse/spatialdata-plot/pull/177/files.

So it seems that:

  • plotting shapes when the table has extra rows is fine
  • the tests https://github.com/scverse/spatialdata-plot/blob/87801ac5cf7e6be067a4acf0a5c1ab1a7b2cb26a/tests/pl/test_render_shapes.py#L159 and https://github.com/scverse/spatialdata-plot/blob/87801ac5cf7e6be067a4acf0a5c1ab1a7b2cb26a/tests/pl/test_render_shapes.py#L179 should be already covering the case in which the indices mismatch.
  • join_spatialelement_table is not used here, probably because it was not available at that time. Here is how it is used in napari-spatialdata: https://github.com/scverse/napari-spatialdata/blob/b50df555d9c0e1335c99afc27bc7b63bf0324934/src/napari_spatialdata/_view.py#L404. If the data to plot is a subset, instead of join_spatialelement_table, the approach is slightly different (it calls an internal function instead function that computes the join even when the table is not part of a SpatialData object, as it happens for subset tables) https://github.com/scverse/napari-spatialdata/blob/b50df555d9c0e1335c99afc27bc7b63bf0324934/src/napari_spatialdata/_viewer.py#L672.

Practically, to address the issue I suggest to proceed as follows:

  • [ ] mimic the use of the join APIs as done in napari-spatialdata
  • [ ] revisit the tests above to cover also the case in which the table has less rows than the shapes. Probably the best is too simultaneously do all the following, which is the most general case:
    • remove some rows from the table
    • remove some different rows from the shapes
    • shuffle both the table and shapes

LucaMarconato avatar Aug 15 '24 16:08 LucaMarconato

Related bug, reported in https://github.com/scverse/spatialdata-plot/issues/357. Using groups leads to a filtering of the table and triggers this bug due to a table/shapes mismatch (that is expected).

LucaMarconato avatar Oct 03 '24 17:10 LucaMarconato