Questions Regarding Proseg Outputs
Hello,
Thank you for creating Proseg - it is a remarkably sophisticated and impressive method!
I have a few questions regarding the outputs generated by Proseg, as well as some clarifications related to the Nature paper:
-
Does Proseg create new cells or retain only those provided in the prior segmentations? According to the paper, Proseg can generate cells using only nuclear stains. However, in this GitHub issue (https://github.com/dcjones/proseg/issues/59), it was mentioned that Proseg does not have the capacity to introduce new cells. Could you please clarify?
-
In the output file
transcripts-metadata.csv.gz(generated by Proseg for a public 10x Xenium dataset; https://www.10xgenomics.com/datasets/ffpe-human-pancreas-with-xenium-multimodal-cell-segmentation-1-standard), theassignmentcolumn includes a value of4294967295, which does not appear incell-metadata.csv.gz:
For reference, the command I used to run Proseg was:
proseg Xenium_V1_human_Pancreas_FFPE_outs/transcripts.csv.gz --xenium
The corresponding background values in transcripts-metadata.csv.gz include both 1 and 0 for this assignment number - I'm not sure if I should assume these transcripts as background noise?
-
The total number of transcripts in
transcripts-metadata.csv.gz(7,166,842) is lower than the number reported by Xenium (8,073,840). Is there a subsampling strategy or filtering step being applied by Proseg that accounts for this difference? -
How would you recommend computing statistics such as the proportion of Proseg-assigned transcripts?
-
There appears to be a discrepancy in assigned transcript counts between
transcripts-metadata.csv.gzandcell-metadata.csv.gz, as described in this issue (https://github.com/dcjones/proseg/issues/16). Has this issue been addressed?
Thank you very much in advance for your time and assistance!
Best, Jaspreet
Hi Jaspreet,
Thanks for trying out proseg!
- Proseg optimizes boundaries of a fixed number of cells that are provided in prior segmentation. Usually this is from prior segmentation done on a nuclei stain. So if your prior initialization has
ncells, proseg will never output more thann, and will generally output slightly fewer (since often not every prior cell can be represented in the voxelization). - That's a special value (it's the maximum value in 32-bit integer) that's used internally to represent "unassigned". I do that for efficiency, but I realize it's confusing so in the future I'll try to use NA.
- There isn't any subsampling, but the is a filtering step that removes transcripts that are far away from any initializing cell center. In the latest version, on Xenium, transcripts with quality values below 20 are also filtered out.
- The way I calculate proportion assigned is to just sum the count matrix. The likely confusing part about the
assignmentcolumn is that it just records what cell a transcript overlaps, but it's no counted when thebackgroundorconfusioncolumns are 1. - It's not addressed yet, they'll likely disagree somewhat. I am working on simplifying the output, so it is going to be more consistent and easier to understand in the next major release.
Let me know if you have other questions!