glmGamPoi
glmGamPoi copied to clipboard
pseudobulk_by apparently dropping columns
Hi, I am successfully using your package with scRNA-Seq data, but finding a possible bug when trying out the pseudobulk_by
option. Below I show it working fine without pseudobulk_by
and then failing when I add it in:
s2<-as.SingleCellExperiment(s)
design<- model.matrix(~ 0 + experimentGroup,s2@colData)
glmGamPoi.fit <- glm_gp(s2
,design = design
,size_factors = "deconvolution"
,on_disk = FALSE
)
This worked fine. My grouping factor and design matrix looks like this:
unique(s2@colData$experimentGroup)
[1] LL_H_000 LL_N_000 LL_N_030 LL_N_060 LL_N_180 LL_N_300 LL_N_600
Levels: LL_H_000 LL_N_000 LL_N_030 LL_N_060 LL_N_180 LL_N_300 LL_N_600
head(design)
LL_H_000 LL_N_000 LL_N_030 LL_N_060 LL_N_180 LL_N_300 LL_N_600
homeo_AAACCTGAGAAGCCCA 1 0 0 0 0 0 0
homeo_AAACCTGAGGAGCGTT 1 0 0 0 0 0 0
homeo_AAACCTGCAACTGCGC 1 0 0 0 0 0 0
homeo_AAACCTGCAAGTAGTA 1 0 0 0 0 0 0
homeo_AAACCTGCATATGAGA 1 0 0 0 0 0 0
homeo_AAACCTGGTGCAACTT 1 0 0 0 0 0 0
Testing for DGE with the following contrast works fine:
contrast=c('LL_N_000-LL_H_000', 'LL_N_030-LL_H_000', 'LL_N_060-LL_H_000', 'LL_N_180-LL_H_000', 'LL_N_300-LL_H_000','LL_N_600-LL_H_000')
contrast<-limma::makeContrasts(levels=l,contrasts=contrast)
glmGamPoi.test <- test_de(glmGamPoi.fit
, contrast = contrast
##,pseudobulk_by='experimentGroup'
)
... the problem comes when I try to test using pseudobulk_by
, as
glmGamPoi.bulk.test <- test_de(glmGamPoi.fit
, contrast = contrast
,pseudobulk_by='experimentGroup'
) %>% setDT
Error in handle_design_parameter(design, data, col_data, reference_level) :
The model_matrix has more columns (7) than the there are samples in the data matrix (7 columns).
Too few replicates / too many coefficients to fit model.
The head of the design matrix:
LL_H_000 LL_N_000 LL_N_030 LL_N_060 LL_N_180 LL_N_300 LL_N_600
LL_H_000 1 0 0 0 0 0 0
LL_N_000 0 1 0 0 0 0 0
LL_N_030 0 0 1 0 0 0 0
The head of the data:
LL_H_000 LL_N_000 LL_N_030 LL_N_060 LL_N_180
rpl24 21728 14048 13748 19649 14109
cep97 76 18 21 9 55
eed 270 86 96 88 257
However by my calculations the aggregated data columns should d be there and are somehow erroneously being dropped, , viz::
library(DelayedArray)
grs<-colsum(s2@assays@data$counts,s2@colData$experimentGroup)
head(grs)
LL_H_000 LL_N_000 LL_N_030 LL_N_060 LL_N_180 LL_N_300 LL_N_600
rpl24 21728 14048 13748 19649 14109 11293 11562
cep97 76 18 21 9 55 32 35
eed 270 86 96 88 257 176 101
hikeshi 459 98 120 137 458 380 144
tmem39a 296 134 68 82 132 63 96
ildr1a 746 471 620 460 435 261 342
I could potentially share the s2 object via ftp if it would help you sleuth this.
Or perhaps you recognize some error on my part?
I think the problem is that you use the experimentGroup
to form the pseudobulk, but also want to use it as the design
argument.
The manual way of forming the pseudo-bulk would look like:
splitter <- split(seq_len(ncol(s2)), s2$experimentGroup)
pseudobulk_mat <- do.call(cbind, lapply(splitter, function(idx){
matrixStats::rowSums2(assay(s2), cols = idx)
}))
pseudo_design_mat <- do.call(rbind, lapply(splitter, function(idx){
matrixStats::colMeans2(glmGamPoi.fit$model_matrix, rows = idx)
}))
If you look at pseudobulk_mat
, you will see that it has 7 columns. At the same time, you want to fit a model which also has 7 different parameters, which fails with the error message: Too few replicates / too many coefficients to fit model.
.
Best, Constantin