glmGamPoi
glmGamPoi copied to clipboard
Too many DEGs?
Hi,
I am a fresh user of glmGamPoi and thanks for providing this awesome tool! I tried to identify DEGs in scRNA and successfully run the following code:
load('/Volumes/kfsd/GSE200903_new_combined_cluster_FibroSamples_2022-01-14.RData')
combine_cluster <- as.SingleCellExperiment(combined_cluster)
set.seed(1)
colData(combine_cluster)
# Take highly expressed genes and proper cells:
combine_cluster_subset <- combine_cluster[rowSums(counts(combine_cluster)) > 100,
which( ! is.na(combine_cluster$CellType) & combine_cluster$CellType
%in% c("fibroblasts"))]
# add a stim2 based on stim
colData(combine_cluster_subset)$stim2 <- colsplit(colData(combine_cluster_subset)$stim,
split = "_", names = c('cell', 'trt','time'))$trt
# Convert counts to dense matrix
counts(combine_cluster_subset) <- as.matrix(counts(combine_cluster_subset))
# Remove empty levels because glm_gp() will complain otherwise
colData(combine_cluster_subset)$stim2 <- as.factor(colData(combine_cluster_subset)$stim2)
colData(combine_cluster_subset)$stim2 <- relevel(colData(combine_cluster_subset)$stim2, ref = "normal")
fit <- glm_gp(combine_cluster_subset, design = ~ stim2 ,
reference_level = "normal")
summary(fit)
colnames(fit$Beta)
# The contrast argument specifies what we want to compare
# We test the expression difference of stimulated and control T-cells
#
# There is no sample label in the colData, so we create it on the fly
# from `stim` and `ind` columns in colData(fit$data).
de_res <- test_de(fit, contrast = `stim2PDAC`,
pseudobulk_by = paste0(stim2, "-", rownames(colData(combine_cluster_subset))))
# The large `lfc` values come from groups were nearly all counts are 0
# Setting them to Inf makes the plots look nicer
de_res$lfc <- ifelse(abs(de_res$lfc) > 20, sign(de_res$lfc) * Inf, de_res$lfc)
# Most different genes
de_res<-de_res[order(de_res$pval), ]
However, I identified 11752 DEGs out of 14537 genes and I found many genes has df1=1.
head(de_res)
name pval adj_pval f_statistic df1 df2 lfc
35 Rpl7 0 0 3244.666 1 20005.31 -0.5195890
44 Pi15 0 0 2180.603 1 20005.31 -1.6835958
124 Il1r1 0 0 3883.918 1 20005.31 2.2249821
132 Fhl2 0 0 5068.944 1 20005.31 4.8176423
158 Gls 0 0 1635.089 1 20005.31 0.9511179
164 1700019D03Rik 0 0 3380.809 1 20005.31 -2.7193859
I wondered how could I address this problem? Thanks in advance!
Best, Kun
Hi KunFang,
thank you for reaching out and for the detailed description of your problem. From a brief look over your code, it seems fine, but that is difficult to say for certain without access to the input data.
If you are concerned that the fitting is wrong, you could fit the model with DESeq2
or edgeR
and compare if the results are similar.
Best, Constantin