PyDESeq2 icon indicating copy to clipboard operation
PyDESeq2 copied to clipboard

[BUG] local variable 'beta_init' referenced before assignment

Open redst4r opened this issue 1 year ago • 2 comments

Describe the bug

Under certain circumstances, dds.deseq2() crashes with a UnboundLocalError: local variable 'beta_init' referenced before assignment (originating in pyydeseq2/utils.py)

From what I can tell, two conditions have to be met for this bug to trigger:

  • the design matrix is not full rank (beta_init never gets initialized):
  # if full rank, estimate initial betas for IRLS below
  if np.linalg.matrix_rank(X) == num_vars:
      Q, R = np.linalg.qr(X)
      y = np.log(counts / size_factors + 0.1)
      beta_init = solve(R, Q.T @ y)
      beta = beta_init
  else:  # Initialise intercept with log base mean
      beta = np.zeros(num_vars)
      beta[0] = np.log(counts / size_factors).mean()
      # !!! beta_init never gets set !!!
  • IRLS starts diverging, and we switch to LBFGS:
  #line 560 in utils.py
  res = minimize(
      f,
      beta_init,   # <------ never got initialized
      jac=df,
      method=optimizer,
      bounds=(
          [(min_beta, max_beta)] * num_vars
          if optimizer == "L-BFGS-B"
          else None
      ),
  )

I don't know enough about IRLS (and its probably a bad thing that my design matrix is rank-deficient), but it might be as simple as modifying it to this:

  # if full rank, estimate initial betas for IRLS below
  if np.linalg.matrix_rank(X) == num_vars:
      Q, R = np.linalg.qr(X)
      y = np.log(counts / size_factors + 0.1)
      beta_init = solve(R, Q.T @ y)
      beta = beta_init
  else:  # Initialise intercept with log base mean
      beta = np.zeros(num_vars)
      beta[0] = np.log(counts / size_factors).mean()

      beta_init = beta   # <----------- added line

That ensures that if we switch to BFGS, we have an initial beta to start at. The issue with the non-full-rank: I know it doesn't make sense to run DESeq2 with that, but an error should make that more obvious (rather some error thrown due to an uninitialized variable).

Desktop (please complete the following information):

  • OS: ArchLinux
  • Version pydeseq 0.4.9

redst4r avatar May 16 '24 01:05 redst4r

Hi @redst4r, good catch, thanks for pointing out this bug!

The issue with the non-full-rank: I know it doesn't make sense to run DESeq2 with that, but an error should make that more obvious (rather some error thrown due to an uninitialized variable).

Agreed, but I'm curious about the context in which you obtained this error. DeseqDataSet should normally throw a warning at initialization if the design matrix is rank-deficient.

BorisMuzellec avatar May 16 '24 08:05 BorisMuzellec

'm not quite sure how I got there. I'll try to put together a minimal example.

redst4r avatar May 17 '24 18:05 redst4r