ALRA
ALRA copied to clipboard
Which `selection.method` to use for FindingVariableFeatures on ALRA imputed data
Hi @linqiaozhi @JunZhao1990 @rcannood @inoue0426
I was following this issue where @ChristophH mentions that
Results should not be very different from using the original "count" data. Generally, using "data" slot should work with "vst" method as long as the loess fit can capture the mean
- variance relationship.
Also, @linqiaozhi suggests
For example, "The VST selection method uses count data and does not use the ALRA imputed data; please use mean.var.plot instead, if you would like to find the variable genes based on the imputed data."
So I decided to see if this relationship of mean-variance could be captured better by vst
or mean.var.plot
method of Seurat. Unlike mca
(Malaria Cell Atlas) that I wish to use as reference and didn't perform imputation on, some cells in my samples (t1,n1) shows some deviation from the linear relationship. Is this slight deviation anticipated ?
I also observe that the standardized variance for imputed data is based at 1 unlike MCA which is based at zero. So will this be a problem when I perform integration with MCA of these samples? I am trying to resolve the problem of Jackstraw plot having all PCs as significant that I discuss in another issue here and I thought maybe the nature of imputed data or the method used for feature selection might be influencing this.
Hi @ChristophH. Do you have thoughts on this?
I'd stick to the raw counts (not imputed) and use the vst
method. If there is no mean-variance relationship, the data is violating some basic assumptions the method is based on, so proceed with care.