tidybulk
tidybulk copied to clipboard
The first element of assays in SE might be a matrix when using DESeq2 method
The related issue
> library(tidybulk)
========================================
tidybulk version 1.12.0
If you use TIDYBULK in published research, please cite:
Mangiola et al. tidybulk: an R tidy framework for modular
transcriptomic data analysis. Genome Biology 2021.
This message can be suppressed by:
suppressPackageStartupMessages(library(tidybulk))
========================================
> library(SummarizedExperiment) %>% suppressPackageStartupMessages()
> assays(se_mini)[[1]] <- assays(se_mini)[[1]] %>% data.frame()
> se_mini %>% identify_abundant() %>% test_differential_abundance(.formula=~condition, method='DESeq2')
No group or design set. Assuming all samples belong to one group.
renaming the first element in assays to 'counts'
Error in DESeq2::DESeqDataSet(., design = .formula) :
counts matrix should be numeric, currently it has mode: list
>
when convert it to matrix, it works
> assays(se_mini)[[1]] <- assays(se_mini)[[1]] %>% as.matrix()
> se_mini %>% identify_abundant() %>% test_differential_abundance(.formula=~condition, method='DESeq2')
No group or design set. Assuming all samples belong to one group.
renaming the first element in assays to 'counts'
converting counts to integer mode
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
fitting model and testing
tidybulk says: to access the raw results (fitted GLM) do `attr(..., "internals")$DESeq2`
class: SummarizedExperiment
dim: 527 5
metadata(0):
assays(1): count
rownames(527): ABCB4 ABCB9 ... ZNF324 ZNF442
rowData names(8): entrez .abundant ... pvalue padj
colnames(5): SRR1740034 SRR1740035 SRR1740043 SRR1740058 SRR1740067
colData names(5): Cell.type time condition days dead
>
@mikelove in case you have any opinion
Great!
Thanks for the PR @xiangpin !
Maybe I'm missing the obvious but why should we add a assay that is a dataframe where DESeq2 needs counts. I fail to see the use-case could you be more explicit?
I would say that having a data.frame instead of matrix in the SummarizedExperiment is not typical.
Yes, a data.frame in the SummarizedExperiment is not typical. But for many users, they might also import a data frame to assays when using the SummarizedExperiment function to build a SummarizedExperiment object, since the help information of SummarizedExperiment show that assays: A ‘list’ or ‘SimpleList’ of matrix-like elements, or a matrix-like object (e.g. an ordinary matrix, a data frame, a DataFrame object from the ‘S4Vectors’ package, a sparseMatrix derivative from the ‘Matrix’ package, a DelayedMatrix object from the ‘DelayedArray’ package, etc...). and data frame might be more similar to other objects for many users.
As the original SE is not changed (but only just the DESeq2 object, passed to the internals for reproducibility, is), I think this PR is great.
@xiangpin The only thing I ask before pushing is a unit test that would fail before PR (current master) and produce the expected result with the PR.
Thanks for this PR!