proseg icon indicating copy to clipboard operation
proseg copied to clipboard

Handling of negative controls during proseg optimization

Open professor-sagittarius opened this issue 3 months ago • 3 comments

Currently the default behavior of proseg is to completely ignore negative control features. Since negative controls provide useful information, a better option would be for proseg to prevent negative controls from contributing to optimization, but still include them in the output transcripts and tables.

Does --include-neg-ctrls do this, or does it treat negative controls like any other features during optimization?

professor-sagittarius avatar Sep 22 '25 23:09 professor-sagittarius

Alternatively, negative controls might be used as a penalty on the extension of cell boundaries. For instance, consider the cell type below, which is sandwiched between two autofluorescent structures. Only SystemControls are shown in this image.

Image

Now consider the same FOV with only true targets shown.

Image

There is a line of transcripts on the autofluorescent structures, but these are likely to be false calls based on the higher concentration of SystemControls in the vicinity. Have you considered incorporating a local transcript false discovery rate into the model? I think the ideal scenario for cells like this would be that cell boundary extension stops once it reaches the areas with high concentrations of SystemControls, to avoid assigning spurious transcripts to the nearby cells. (Obviously reduction of autofluorescence would be the first line of defense in this situation, but the tissue above was photobleached for 4 hours, so this is the residual).

professor-sagittarius avatar Sep 26 '25 18:09 professor-sagittarius

I definitely agree that there should be a way to report the negative controls without necessarily using them in segmentation. That should be pretty easy to implement, so I'll try to add it soon.

As for using them in the model, I think that could be promising, but I'll have to think about how that could fit in the model. It's clear that the false positive rate does tend to vary spatially. I think that's somewhat correlated to transcript density (optical crowding driving up the error rate) and I try to model that in proseg 3, but there could definitely be other causes.

dcjones avatar Sep 26 '25 22:09 dcjones

@dcjones just so that I can clarify, using the include negative probes in the function call adds the negative probes into the genes matrix of the spatialdata object?

jai-mathur avatar Oct 12 '25 12:10 jai-mathur