sctransform
sctransform copied to clipboard
"umi_corrected" vs original umi in function "sctransform::correct"
Hi,
Could you describe how "umi_corrected" is generated by the function sctransform::correct ("sctransform" calls "vst", which in turn calls sctransform::correct), if possible?
As far as I have understood, looks like you input the residual and the estimated parameters to get the "umi_corrected", and further round it and eliminate negative values. But except the round process, how come part of the "umi_corrected" is different with the original "umi"? and why its necessary?
thanks
The correct
replaces the observed values of the independent variables (by default this is the log10 of the sum of all UMI counts in a cell - a proxy for sequencing depth) with its median and then reverses the NB regression model. That is, the Pearson residuals are turned back into UMI counts as if every cell had been sequenced to the same depth. As a result, umi_corrected
are not correlated with sequencing depth.
The
correct
replaces the observed values of the independent variables (by default this is the log10 of the sum of all UMI counts in a cell - a proxy for sequencing depth) with its median and then reverses the NB regression model. That is, the Pearson residuals are turned back into UMI counts as if every cell had been sequenced to the same depth. As a result,umi_corrected
are not correlated with sequencing depth.
Thanks a lot, that's very clear answer.
Also I'd like to request your expertise on how to make most of the vst results:
Would you suggest
-
using centered "Pearson residuals" for clustering (PCA,tSNE,Umap); "Corrected umi" for Differential Expression analysis of Deseq2, negbinom and poisson; "log1p(Corrected umi)" for Differential Expression analysis of other included methods?
-
For integration of Single-Cell data from different batches/modals, only using "Pearson residuals" to select features of each batch/modal, centered "Pearson residuals" for subsequent anchor identification/scoring/weighting?
thanks a lot
What would one expect the distribution of total UMI count (ie. sequencing depth) in the corrected count matrix to look like?
I would have thought that it should be close to constant, centered on the median UMI value that is used, perhaps with some error due to rounding. However, it seems to be pretty spread out, and actually bimodal (although the median is what it's supposed to be). Where is that variation coming from?
Thanks