Design matrix is not full rank (on example that works in edgeR)
Soneson et al, 2018, NMeth recommend to add the fraction of detected genes as a covariate to the linear model for DE testing:
~ group + n_genes
(instead of using a ZINB model (#131))
When I try add n_genes to my diffxpy model, it throws a ValueError:
> test = de.test.wald(
data=counts.values.T,
formula_loc="~ 1 + group + n_genes",
factor_loc_totest=["group"],
coef_to_test="group",
sample_description=obs,
batch_size=100,
training_strategy="DEFAULT",
dtype="float64",
gene_names=counts.index
)
ValueError: constrained design matrix is not full rank: 80 81
The same model runs fine with edgeR:
d <- DGEList(counts)
d <- suppressWarnings(edgeR::calcNormFactors(d))
design <- model.matrix(~ 1 + group + n_genes, data=obs)
d <- estimateDisp(d, design)
fit <- glmFit(d,design)
lrt <- glmLRT(fit, coef = 2)
pval <- lrt$table$PValue
Is that a bug? Or am I missing something? Thanks for your help!
Full repex
- A reproducible example (Rmd notebook + sample data) is available at https://github.com/grst/benchmark-single-cell-de-analysis/tree/master/diffxpy_test. The example uses diffxpy v0.7.1 from PyPI.
- An executed version of the notebook is here.
- To run the notebook in my conda-environment, use
nextflow rundiffxpy_test.nf.
Hi @grst, I assume n_genes is numeric? We enforce that numeric covariates are independently verified to be desired numeric and not by accident converted from one-hot categorical. This is done by adding the argument as_numeric=['n_genes'], documented here: https://diffxpy.readthedocs.io/en/stable/api/diffxpy.api.test.wald.html#diffxpy.api.test.wald. Thanks for the detailed issue!
Thanks, that worked (with version 0.6.13)!
Completed at: 12-Nov-2019 13:50:06
Duration : 5m 6s
CPU hours : 0.1
Succeeded : 1
With version 0.7.1 it runs for several hours and then fails with a LinAlgError: Singular matrix:
Detailed traceback:
File "<string>", line 11, in <module>
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/diffxpy/testing/tests.py", line 648, in wald
**kwargs,
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/diffxpy/testing/tests.py", line 169, in _fit
estim.train_sequence(training_strategy=training_strategy)
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/batchglm/train/numpy/base_glm/estimator.py", line 52, in train_sequence
self.train(**training_strategy)
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/batchglm/train/numpy/base_glm/estimator.py", line 79, in train
self.model.a_var = sel