proseg Memory limit silent failure

Hi @dcjones

I'm trying to run ProSeg (2.0.4) on some Xenium outputs, but running into two problems. I was able to get one sample to complete successfully, but a second one to be crashing after it hits 100%. If I pre-filter the transcripts down using qv30 instead it seems fine, is this just a fluke that 32GB of RAM was enough to process the 49M transcripts but not 53M? Is there a verbose error for memory issues?

Sample 1 (SUCCESS): 49,004,872 transcripts 114,592 cells Grid size: 231.03, 1176 chunks

Sample 2 (FAILED): 53,804,275 transcripts 122,769 cells Grid size: 226.25, 1247 chunk

For the one that completed I then converted to baysor and used xeniuim ranger to generate the new outputs but xenium explorer is giving an error: Could not read \morphology.ome.tif. Error: Multi-source mismatch: [object Object],[object Object]

Any thoughts?

Jul 24 '25 16:07 jspence-gh

It's definitely possible that slightly more transcripts caused in to run out of memory and crash. Usually if it crashes without printing out any kind of error message, an out of memory error is the culprit.

As for the xenium ranger error, I'm really not sure. It's referencing the morphology.ome.tif files, which proseg never touches. I'd make sure that file is where it should be, and you could try opening in e.g. qupath to make sure it's not corrupted.

Jul 24 '25 21:07 dcjones

Great, thanks, Seemed strange because after the first time I was monitoring the my memory usage, and it didnt seem to be maxing out, but there wasnt much headroom, so maybe just a spike I didnt catch right at the end. I'll try it on a HPC.

Not sure if it's a faux pas (sorry, new here) to ask here, but I was also wondering if you had any suggestions on parameter tweaks? I'm working with full mouse brain FFPE sections in a glioblastoma model. xenium v1 experiment with just DAPI staining and 5um expansion for the cell segmentation, which as you might expect, was terrible, especially for the adaptive immune cells. The default proseg output looks better, but still some mixed phenotypes. Any suggestions would be appreciated!

Jul 25 '25 01:07 jspence-gh

Feel free to ask whatever questions, or open new issues if you run into problems!

Brain is very difficult to segment well. I think proseg can do a better job of assigning transcripts, but probably gets the morphology wrong in many cases. I'm not certain what would improve results on your data, but some things you could try:

Inspect the prior nuclei segmentation and make sure that's doing a good job. Since proseg relies on this, if it's missing a lot of cells, that error will get propogated. If it looks problematic, you can try cellpose, which is a hassle but gives you more options to tweak.
Try allowing more transcript repositioning. This can help get better assignments in cases (like neurons) where proseg lacks the resolution or transcript density to accurately infer the morphology. I would just try --diffusion-sigma-near 1.0 at first, and see if that is any better. There's more you can tinker with here, but that's a good test.
--use-scaled-cells: gives proseg more freedom in modeling variation in transcript density across cells. This sometimes improves segmentation and sometimes makes it worse (which is why it's disabled by default). It might be worth a try here.

Jul 28 '25 18:07 dcjones

Amazing, thanks so much. I will try --diffusion-sigma-near 1.0 and --use-scaled-cells and see if that improves things.

Looking at the methodology in the paper and the info here, I don't see any mention of preprocessing steps other than removing cells with less than 10 transcripts. I know the --xenium defaults to a qv of 20, but wouldn't it be advisable to also remove the blank codewords and control probes from the data before running proseg?

Jul 29 '25 04:07 jspence-gh

In my testing including or excluding negative controls makes very little difference to the quality of segmentation. Including them will use some more memory and take bit longer to run, so usually only include them if I plan to use them in some downstream analysis or qc. They are excluded by default when using --xenium or --cosmx so you don't need to worry about filtering the transcripts file.

Jul 29 '25 16:07 dcjones

Ah, I see. I didnt realize they weren't included. I had run a script to pre-filter them, but then adding them back was a pain if I wanted to use them for downstream QC, but that's probably just my inexperience. Do you find that using the ProSeg outputs alters your QC approach? I was using MAD for area and counts to filter out large segmentation errors or pseudo- 'doublets' but I'm guessing that with more accurate cell morphology and the increased heterogeneity of my cell borders in the brain then statistical outliers may be of even less relevance? Similar problem with looking at the counts data when you have a highly targeted panel of only 347 genes, though I see surprisingly little discussion about these kinds of problems.

Jul 29 '25 17:07 jspence-gh