Clamping Intensity Readouts in Pooled Cell Painting Spot Calling
One major goal of the pooled cell painting collaboration is to maximize our ability to assign CRISPR guide perturbations to cells.
The first major step in this process is to read out a barcode based on intensity measurements in four channels (A, C, T, G) across n cycles (where n is typically between 9 and 12).
One approach to this process is based on a generative bayesian model, which @mbabadi has expertise in implementing. In chatting with @mbabadi, he notes:
The way the data is processed at the moment is non-ideal for Bayesian modeling: the color compensation is a linear max likelihood method that often produces negative or out-of-bounds intensities. These are clamped to the physical interval (here) and as a result, we end up getting a lot of 0's. While this might be a favorable side-effect for thresholding-based base calling, it is not, for Bayesian base calling.
@erinweisbart has already implemented this solution for an example site, and we are working on implementing the model. I am adding documentation here that we might want to consider implementing this approach globally, in this repo. It does align with the lab's philosophy of "measure everything, ask questions later!"
I can make the clamping optional, with default yes; we can then run it with/without clamping.
I'm closing this issue because we are no longer working on developing a barcode calling method that requires unclamping and this does not fit into our current workflow without extensive changes to the workflow (see conversation in https://github.com/broadinstitute/pooled-cell-painting-analysis/issues/92).