tidybulk
tidybulk copied to clipboard
Allow logFC shrinkage in DESeq
For gene ranking (for GSEA, for example) and visualizations (on volcano plots, for example), DESeq author Michael Love suggests shrinking of the effect sizes (log-fold changes) with the lfcShrink function applied to a DDS object created with DESeq()
Shrinkage was once the default for DESeq(); however since version 1.16 it's been a separate function in anticipation of the addition of other (better?) effect-size estimators to the DESeq workflow. In addition, isolation of the function allows to accommodate other types of analyses which are more sensitive to shrinking LFCs than bulk RNA-Seq.
Some references below with Love's recommendations: https://support.bioconductor.org/p/77461/ https://support.bioconductor.org/p/95695/
Given that, maybe a generic effect_size_estimator parameter could be helpful, with lfcShrink being one of the options. Could add more in the future.
Also, looks like lfcShrink can take contrast as parameter (but not for apeglm type), which could be passed down from the deseq2() function.
@mikelove any feedback/idea on this?
I can help with this.
I would argue for practical reasons as well as semantic reasons, it may best be implemented as separate function from test_differential_abundance, as typically the effect size is estimated apart from the null hypothesis testing. If you would consider this, I'd suggest something like "compute/estimate" with "posterior LFC" or "effect size".
Motivation: Neither the basic (MLE) LFC nor t-statistic (equivalently p-value) are very good at ranking genes by effect size, which we described in Zhu et al 2019 (with the exception of after having filtered to an FDR bounded set). But for ranking the entire gene list, apart from resorting to null hypothesis testing, you need a good effect size estimate for plotting or some downstream applications.
We have two main implementations within lfcShrink: ashr (which can be used directly on a results table, e.g. pivot_transcripts then applying lfcShrink), and apeglm (which requires the counts again, as it uses the likelihood in performing effect size shrinkage). We could also run edgeR's predFC, which has a free parameter prior.count that one would want users to be able to access, via ellipses should be fine.
Practically, this would mean passing the tidybulk or SE object to a new function which could run various methods to output a posterior effect size and (optionally) posterior SD. Would recommend naming this something other than log2FoldChange to avoid collision.
One more detail is that in our implementation, ashr can operate on any previously calculated LFC which would be present in the columns of the object after test_differential_abundance, while apeglm requires specifying which coef should be shrunk (by number or name for example). The names of the coefficients can be obtained from the fitted DESeqDataSet with resultsNames. Most often users want to shrink the last coef, but this could be exposed with ellipses. So to mention again, ashr could be run even if the counts were dropped.
Another good effect size ranking method that could be included is:
http://bioconductor.org/packages/release/bioc/html/topconfects.html https://github.com/pfh/topconfects