proseg icon indicating copy to clipboard operation
proseg copied to clipboard

Baysor outputs as inputs to Proseg

Open professor-sagittarius opened this issue 3 months ago • 5 comments

I'm thinking about ways to overcome this limitation of Proseg:

Proseg relies on prior (usually image-based) segmentation to determine the number and approximate location of cells. It doesn't introduce new cells, so if the prior segmentation missed many cells, Proseg is not able to correct for that error.

I have some CosMx samples where the morphology staining basically failed. Very weak staining resulted in few cells being called where they are known to exist. Baysor fixes this and correctly identifies cells in the apparently empty areas, but at the expense of the usual drawbacks of Baysor (overpartitioning etc.). One possible solution is to pipe Baysor outputs to Proseg.

Have you considered something like proseg --baysor /path/to/baysor/output? Can you think of any reasons this would produce unexpected results?

professor-sagittarius avatar Sep 05 '25 17:09 professor-sagittarius

In fact, I think you can do this now without a special option if you're willing to write a ton of arguments to describe the input format:

proseg \
    --gene-column gene \
    --transcript-id-column molecule_id \
    --x-column x \
    --y-column y \
    --z-column z \
    --cell-id-column cell_id \
    --cell-id-unassigned 0 \
    baysor-output.csv

I haven't really tested this, but don't see why it shouldn't work. The obvious caveat is that I don't trust Baysor, or any transcript-based method, to be able to accurately determine the correct number of cells, but if the stain failed this seems to me like a reasonable plan to salvage some usable data.

dcjones avatar Sep 08 '25 15:09 dcjones

Yes, this is what I ended up doing, with the addition of --cell-assignment-column is_noise --cell-assignment-unassigned true just in case.

Agreed on the caveat, but in this instance, I trust Baysor more than I trust Cellpose (which has failed to identify upwards of 90% of cells in some FOVs).

By the way, nice move changing the standard output to spatial-data zarr.

professor-sagittarius avatar Sep 08 '25 17:09 professor-sagittarius

Hello @professor-sagittarius,

Since you already use Sopa, you can also provide prior_shapes_key="baysor_boundaries" when creating proseg's input. For instance:

sopa.segmentation.baysor(sdata) # as in the tutorial

sopa.make_transcript_patches(sdata, patch_width=None, prior_shapes_key="baysor_boundaries")
sopa.segmentation.proseg(sdata)

NB: I randomly saw this issue, and since I know @professor-sagittarius is using Sopa, I allowed myself to answer this, I hope it's not a problem for you @dcjones

quentinblampey avatar Sep 11 '25 12:09 quentinblampey

Hi @quentinblampey, thanks for the advice. I have had some issues using Proseg with Sopa on CosMx data, but I haven't had a minute to do a decent write-up of the problem.

professor-sagittarius avatar Sep 11 '25 16:09 professor-sagittarius

Okay, good to know, don't hesitate to open a new issue for that if you want I imagine it's related to the CosMx reader, since we don't have issues with proseg for other readers

NB: once the CosMx reader is stable, we'll likely move it to spatialdata-io for future maintenance

quentinblampey avatar Sep 11 '25 16:09 quentinblampey