PyDESeq2
PyDESeq2 copied to clipboard
[BUG] Overflow in `.deseq2()` and and `lfc_shrink()`
To Reproduce
dds = DeseqDataSet(
counts=counts_df,
metadata=metadata,
design_factors=['condition', "RNA_Qubit_actual", 'treatment'],
continuous_factors=["RNA_Qubit_actual"],
refit_cooks=True,
n_cpus=12,
)
dds.deseq2()
Output:
... done in 0.09 seconds.
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:553: RuntimeWarning: overflow encountered in exp
mu_ = np.maximum(size_factors * np.exp(X @ beta), min_mu)
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:404: RuntimeWarning: invalid value encountered in subtract
-logbinom
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:557: RuntimeWarning: overflow encountered in exp
mu_ = np.maximum(size_factors * np.exp(X @ beta), min_mu)
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:560: RuntimeWarning: invalid value encountered in divide
+ ((1 / disp + counts) * mu_ / (1 / disp + mu_)) @ X
...
Fitting dispersions...
... done in 5.68 seconds.
Fitting dispersion trend curve...
... done in 10.39 seconds.
Fitting MAP dispersions...
... done in 6.41 seconds.
Fitting LFCs...
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:406: RuntimeWarning: invalid value encountered in multiply
- counts * np.log(mu)
... done in 6.06 seconds.
Refitting 0 outliers.
This part finishes, but gives runtime warnings. Will this make the output incorrect? Why would this happen?
Also, I'm getting something similar with stat_res.lfc_shrink()
:
cooks_filter=True
stat_res = DeseqStats(
dds,
contrast=["RNA-Qubit-actual", "", ""],
alpha=0.05,
cooks_filter=True,
independent_filter=True,
)
stat_res.run_wald_test()
# Filter based on Cook's outliers
if cooks_filter:
stat_res._cooks_filtering()
# Does either independent filtering, where genes are filtered out based on low counts
if stat_res.independent_filter:
stat_res._independent_filtering()
# or uses the Benjamini-Hochberg procedure to adjust p-values
else:
stat_res._p_value_adjustment()
# Shrink log fold changes toward 0
stat_res.lfc_shrink()
Output:
Fitting MAP LFCs...
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:1185: RuntimeWarning: overflow encountered in exp
counts - (counts + size) / (1 + size * np.exp(-xbeta - offset))
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:1185: RuntimeWarning: overflow encountered in exp
counts - (counts + size) / (1 + size * np.exp(-xbeta - offset))
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:1185: RuntimeWarning: overflow encountered in exp
...
... done in 27.00 seconds.
Using version 0.4.0
Expected behavior Not getting overflows. Maybe due to precision? Maybe need long floats? My suspicion is that this is happening when certain combinations of design factors yield 0 counts, and it is impossible to determine a dispersion for that gene. What is the best approach here?
Desktop (please complete the following information):
- OS: Ubuntu
- Version 20.04.5 LTS (Focal Fossa)