diffxpy
diffxpy copied to clipboard
Design matrix is not full rank (on example that works in edgeR)
Soneson et al, 2018, NMeth recommend to add the fraction of detected genes as a covariate to the linear model for DE testing:
~ group + n_genes
(instead of using a ZINB model (#131))
When I try add n_genes
to my diffxpy
model, it throws a ValueError
:
> test = de.test.wald(
data=counts.values.T,
formula_loc="~ 1 + group + n_genes",
factor_loc_totest=["group"],
coef_to_test="group",
sample_description=obs,
batch_size=100,
training_strategy="DEFAULT",
dtype="float64",
gene_names=counts.index
)
ValueError: constrained design matrix is not full rank: 80 81
The same model runs fine with edgeR
:
d <- DGEList(counts)
d <- suppressWarnings(edgeR::calcNormFactors(d))
design <- model.matrix(~ 1 + group + n_genes, data=obs)
d <- estimateDisp(d, design)
fit <- glmFit(d,design)
lrt <- glmLRT(fit, coef = 2)
pval <- lrt$table$PValue
Is that a bug? Or am I missing something? Thanks for your help!
Full repex
- A reproducible example (Rmd notebook + sample data) is available at https://github.com/grst/benchmark-single-cell-de-analysis/tree/master/diffxpy_test. The example uses diffxpy v0.7.1 from PyPI.
- An executed version of the notebook is here.
- To run the notebook in my conda-environment, use
nextflow run
diffxpy_test.nf
.
Hi @grst, I assume n_genes
is numeric? We enforce that numeric covariates are independently verified to be desired numeric and not by accident converted from one-hot categorical. This is done by adding the argument as_numeric=['n_genes']
, documented here: https://diffxpy.readthedocs.io/en/stable/api/diffxpy.api.test.wald.html#diffxpy.api.test.wald. Thanks for the detailed issue!
Thanks, that worked (with version 0.6.13)!
Completed at: 12-Nov-2019 13:50:06
Duration : 5m 6s
CPU hours : 0.1
Succeeded : 1
With version 0.7.1 it runs for several hours and then fails with a LinAlgError: Singular matrix
:
Detailed traceback:
File "<string>", line 11, in <module>
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/diffxpy/testing/tests.py", line 648, in wald
**kwargs,
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/diffxpy/testing/tests.py", line 169, in _fit
estim.train_sequence(training_strategy=training_strategy)
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/batchglm/train/numpy/base_glm/estimator.py", line 52, in train_sequence
self.train(**training_strategy)
File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/batchglm/train/numpy/base_glm/estimator.py", line 79, in train
self.model.a_var = sel