[BUG] local variable 'beta_init' referenced before assignment
Describe the bug
Under certain circumstances, dds.deseq2() crashes with a UnboundLocalError: local variable 'beta_init' referenced before assignment (originating in pyydeseq2/utils.py)
From what I can tell, two conditions have to be met for this bug to trigger:
- the design matrix is not full rank (
beta_initnever gets initialized):
# if full rank, estimate initial betas for IRLS below
if np.linalg.matrix_rank(X) == num_vars:
Q, R = np.linalg.qr(X)
y = np.log(counts / size_factors + 0.1)
beta_init = solve(R, Q.T @ y)
beta = beta_init
else: # Initialise intercept with log base mean
beta = np.zeros(num_vars)
beta[0] = np.log(counts / size_factors).mean()
# !!! beta_init never gets set !!!
- IRLS starts diverging, and we switch to LBFGS:
#line 560 in utils.py
res = minimize(
f,
beta_init, # <------ never got initialized
jac=df,
method=optimizer,
bounds=(
[(min_beta, max_beta)] * num_vars
if optimizer == "L-BFGS-B"
else None
),
)
I don't know enough about IRLS (and its probably a bad thing that my design matrix is rank-deficient), but it might be as simple as modifying it to this:
# if full rank, estimate initial betas for IRLS below
if np.linalg.matrix_rank(X) == num_vars:
Q, R = np.linalg.qr(X)
y = np.log(counts / size_factors + 0.1)
beta_init = solve(R, Q.T @ y)
beta = beta_init
else: # Initialise intercept with log base mean
beta = np.zeros(num_vars)
beta[0] = np.log(counts / size_factors).mean()
beta_init = beta # <----------- added line
That ensures that if we switch to BFGS, we have an initial beta to start at. The issue with the non-full-rank: I know it doesn't make sense to run DESeq2 with that, but an error should make that more obvious (rather some error thrown due to an uninitialized variable).
Desktop (please complete the following information):
- OS: ArchLinux
- Version pydeseq 0.4.9
Hi @redst4r, good catch, thanks for pointing out this bug!
The issue with the non-full-rank: I know it doesn't make sense to run DESeq2 with that, but an error should make that more obvious (rather some error thrown due to an uninitialized variable).
Agreed, but I'm curious about the context in which you obtained this error. DeseqDataSet should normally throw a warning at initialization if the design matrix is rank-deficient.
'm not quite sure how I got there. I'll try to put together a minimal example.