proseg icon indicating copy to clipboard operation
proseg copied to clipboard

Disconnected/non-contiguous polygons for a cell

Open paularstrpo opened this issue 3 months ago • 5 comments

Thanks for developing this tool. I am running 3.0.7 with MERSCOPE data and noticed that a substantial portion of the polygons output in the 2D consensus file are discontiguous, as in there are several disconnected shapes in one MultiPolygon geometry for the same cell.

An example of one of these cells with disconnected voxels: Image

This occurs for almost half the cells in this particular sample that I ran:

Image

I also notice that cells wtih these disconnected voxels persist even if I add the --enforce-connectivity flag during the proseg run.

Is this expected behavior?

paularstrpo avatar Sep 16 '25 13:09 paularstrpo

You will get examples like this, but my hope is that they are relatively rare and that most disconnected cells are trivially so like this:

Image

The other thing that complicates things is that a cell that is in fact connected may be rendered as disconnected when polygons are flattened to 2d because the voxels making up the "bridge" that connects the two pieces may allocated to another cell when flattened.

That can happen even with --enforce-connectivity, since that is enforcing connectivity of the 3d cells. Connectivity isn't enforced by default because it comes with a performance cost and my impression there wasn't much benefit. I may revisit that decision though. I'll also continue to investigate if there are feasible changes to improve either polygon flattening or connectivity enforcement.

dcjones avatar Sep 16 '25 17:09 dcjones

Hi @dcjones , I am using version 3.0.7/3.0.8 with Xenium data and have this same conundrum. It does make sense that while the 3D voxel isn't disconnected, the 2D version could be given surrounding cells etc, but it seems very common. I haven't counted how many cells have disconnected elements, but I have looked at total number of polygons and it's roughly double the total cell count for all of my samples. From looking at 20-30ish random cells, it is as you say mostly trivial instances but I found several cells that have small clusters similar to @paularstrpo and a couple that either had quite a long distance to the disconnected 1x1, or had significant area in the disconnected parts:

Image Image

In order to take these forward I need to merge or trim so that each cell has got only a single polygon (Couldn't get MuSpAn to take the multipolygons, although maybe I am missing something) and I was wondering if you had thoughts on strategy there. I was originally thinking just a convex hull but it makes one of the above cells really different, and I don't want to drop all but the largest polygon either, since for the other example that seems like a problem. My current strategy is to just add a tiny bridge myself between all pieces of the cell...

Noting that I ran this with --enforce-connectivity but as mentioned above it didn't seem to make much of a difference, and I suppose that means that it's mostly a 3D vs 2D problem and not a function of lacking connectivity in the 3D space where the algorithm runs.

Would appreciate any additional thoughts you have on this.

Thanks

adrlar avatar Oct 01 '25 20:10 adrlar

I'll definitely investigate this some more. I did fix a bug in the connectivity checking recently, but haven't released a new version yet. That should make --check-connectivity more useful.

One thing I'd suggest trying is to output union polygons with --output-union-cell-polygons, which are collapsed to 2d by taking a simple union of the layers. I think these will less prone to this problem, with the downside that cells will overlap.

dcjones avatar Oct 01 '25 22:10 dcjones

Thanks for the suggestion of --output-union-cell-polygons, that does sort out all the cells that have very strange polygon clusters. There are still a subset, 5% ish, that have disconnected entities but they are much closer in location and are almost always a "touching" corner of two polygons. That's more of a technical limitation of the pixel size and how polygons are coded I suppose, so I'm very happy to artificially connect those with small bridges, as opposed to before when I didn't feel like I was doing the cell justice really.

Thanks!

adrlar avatar Oct 03 '25 08:10 adrlar

I'm glad that improved things. I think it's clear that the "consensus polygons" are the source of a lot of the disconnection. I'll think about if there are improvements I can make to that, and maybe tweaking priors to avoid excessive z-axis overlap.

dcjones avatar Oct 03 '25 18:10 dcjones