PyDESeq2 icon indicating copy to clipboard operation
PyDESeq2 copied to clipboard

[BUG] Overflow in `.deseq2()` and and `lfc_shrink()`

Open BeyondTheProof opened this issue 1 year ago • 3 comments

To Reproduce

dds = DeseqDataSet(
    counts=counts_df,
    metadata=metadata,
    design_factors=['condition', "RNA_Qubit_actual", 'treatment'],
    continuous_factors=["RNA_Qubit_actual"],
    refit_cooks=True,
    n_cpus=12,
)

dds.deseq2()

Output:

... done in 0.09 seconds.

/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:553: RuntimeWarning: overflow encountered in exp
  mu_ = np.maximum(size_factors * np.exp(X @ beta), min_mu)
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:404: RuntimeWarning: invalid value encountered in subtract
  -logbinom
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:557: RuntimeWarning: overflow encountered in exp
  mu_ = np.maximum(size_factors * np.exp(X @ beta), min_mu)
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:560: RuntimeWarning: invalid value encountered in divide
  + ((1 / disp + counts) * mu_ / (1 / disp + mu_)) @ X
...
Fitting dispersions...
... done in 5.68 seconds.

Fitting dispersion trend curve...
... done in 10.39 seconds.

Fitting MAP dispersions...
... done in 6.41 seconds.

Fitting LFCs...
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:406: RuntimeWarning: invalid value encountered in multiply
  - counts * np.log(mu)
... done in 6.06 seconds.

Refitting 0 outliers.

This part finishes, but gives runtime warnings. Will this make the output incorrect? Why would this happen? Also, I'm getting something similar with stat_res.lfc_shrink():

cooks_filter=True

stat_res = DeseqStats(
    dds,
    contrast=["RNA-Qubit-actual", "", ""],
    alpha=0.05,
    cooks_filter=True,
    independent_filter=True,
)
stat_res.run_wald_test()

# Filter based on Cook's outliers
if cooks_filter:
    stat_res._cooks_filtering()

# Does either independent filtering, where genes are filtered out based on low counts
if stat_res.independent_filter:
    stat_res._independent_filtering()
# or uses the Benjamini-Hochberg procedure to adjust p-values
else:
    stat_res._p_value_adjustment()

# Shrink log fold changes toward 0
stat_res.lfc_shrink()

Output:

Fitting MAP LFCs...
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:1185: RuntimeWarning: overflow encountered in exp
  counts - (counts + size) / (1 + size * np.exp(-xbeta - offset))
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:1185: RuntimeWarning: overflow encountered in exp
  counts - (counts + size) / (1 + size * np.exp(-xbeta - offset))
/home/jupyter/env/mambaforge/envs/rome/lib/python3.11/site-packages/pydeseq2/utils.py:1185: RuntimeWarning: overflow encountered in exp
...
... done in 27.00 seconds.

Using version 0.4.0

Expected behavior Not getting overflows. Maybe due to precision? Maybe need long floats? My suspicion is that this is happening when certain combinations of design factors yield 0 counts, and it is impossible to determine a dispersion for that gene. What is the best approach here?

Desktop (please complete the following information):

  • OS: Ubuntu
  • Version 20.04.5 LTS (Focal Fossa)

BeyondTheProof avatar Aug 24 '23 22:08 BeyondTheProof