cvxpy icon indicating copy to clipboard operation
cvxpy copied to clipboard

Trouble finding sparse optima

Open sjkoelle opened this issue 1 year ago • 4 comments

Hi,

Describe the bug I took a look at the tutorial on how to find sparse solutions[https://github.com/cvxpy/cvxpy/blob/master/examples/notebooks/WWW/sparse_solution.ipynb] and have a question about the "l1-heuristic". Basically, the "heuristic" says that we can find a solution by zeroing out close to 0 coefficients. In my experiments with multitask basis pursuit, applying the heuristic (i.e. refitting with sparse variable sets) actually leads to better (lower loss, better constraint satisfaction) solutions than not applying the heuristic! That is, its not just a heuristic, but actually give a better solution! Thus, I am wondering if there is a theoretical justification that the off-the-shelf solver is unable to find sparse solutions such that I can feel good about throwing away low coefficient values.

To Reproduce

import numpy as np
from einops import rearrange
import cvxpy as cp
import matplotlib.pyplot as plt
import seaborn as sns

print('generating multitask sample data')
np.random.seed(42)
n = 1
d = 4
p = 30
sample_grads = .1* np.random.multivariate_normal(np.zeros(d), np.identity(d),p)
X = rearrange(sample_grads, 'p d -> d p')
y= np.identity(d)

print('computing optimizer of full problem')
beta = cp.Variable((p,d))
objective = cp.Minimize(cp.sum(cp.norm(beta, axis = 1)))
constraints = [X @ beta == y]
problem = cp.Problem(objective, constraints)
result = problem.solve()
beta_optimized = beta.value
print('reconstruction constraint and loss', np.linalg.norm(y - X @ beta_optimized ), np.sum(np.linalg.norm(beta_optimized, axis = 1)))

print('many coefficients are close to 0')
plt.hist(beta_optimized)
plt.xscale('symlog')
plt.title('Coefficients')

plt.figure()
print('sparsity is shared across tasks')
sns.heatmap(low_indices)

print('enforece sparsity following I selected a number here following https://github.com/cvxpy/cvxpy/blob/master/examples/notebooks/WWW/sparse_solution.ipynb')
low_indices = np.abs(beta_optimized) < 1e-6 
beta_sparse = beta_optimized.copy()
beta_sparse[np.abs(beta_sparse) < 1e-6] = 0

print('sparse reconstruction constraint and loss', np.linalg.norm(y - X @ beta_sparse ), np.sum(np.linalg.norm(beta_sparse, axis = 1)))

print('refitting with only retained coefficients')
X_restricted = X[:,~low_indices[:,0]]
nonzerorows = len(np.where(~low_indices[:,0])[0])
beta = cp.Variable((nonzerorows,d)) # number of non-zero rows


objective = cp.Minimize(cp.sum(cp.norm(beta, axis = 1)))
constraints = [X_restricted @ beta == y]
problem = cp.Problem(objective, constraints)
result = problem.solve()
beta_restricted = beta.value

print('sparse refit reconstruction constraint and penalty', np.linalg.norm(y - X_restricted @ beta_restricted ), np.sum(np.linalg.norm(beta_restricted, axis = 1)))
print('it looks like the refit solution has better constraint satisfaction and lower loss than the original solution!')

Expected behavior A clear and concise description of what you expected to happen.

Output

generating multitask sample data
computing optimizer of full problem
reconstruction constraint and loss 5.845838050963103e-12 17.868875241224597
many coefficients are close to 0
sparsity is shared across tasks
enforece sparsity following I selected a number here following https://github.com/cvxpy/cvxpy/blob/master/examples/notebooks/WWW/sparse_solution.ipynb
sparse reconstruction constraint and loss 1.0768415105048245e-07 17.868874641499794
refitting with only retained coefficients
sparse refit reconstruction constraint and penalty 1.3190391034620074e-12 17.868875206447616
it looks like the refit solution has better constraint satisfaction and lower loss than the original solution!

image

image

Version

  • CVXPY Version: '1.1.24'

Thanks!

sjkoelle avatar Jun 26 '24 21:06 sjkoelle

I suppose this is just an issue of numerical precision and zeroing is a okay?

sjkoelle avatar Jun 26 '24 22:06 sjkoelle

switching to cp.SCS and lowering eps seems to give ability to crack the "almost zero" values down to 1e-16

sjkoelle avatar Jun 26 '24 22:06 sjkoelle

Thanks for sharing this! Feel free to comment more if other questions come up.

SteveDiamond avatar Jun 26 '24 22:06 SteveDiamond

I wanted to reopen the issue to illustrate a challenge I am currently having. I am trying to understand whether, in the following example, the number of selected coefficients should be 2. However, the solutions I am getting out of SCS are not so sparse. They seem to improve loss by a epsilon small amount (e.g. 1e-16) but cause comparable violations of the constraint as well. It sounded like the scale or epsilon parameters might do this, but that haven't sufficed in my hands. Epsilon in particular seems to not make a difference below 1e-11 or so.

Here's an example of my experiment

import numpy as np
import cvxpy as cp

def group_lasso_norm(beta):  # beta_bp
    """Computes the group basis pursuit loss of the matrix beta"""
    output = np.linalg.norm(beta, axis=1).sum()
    return output


def group_basis_pursuit(
    matrix,
    eps=1e-12,
    threshold=1e-6,
    max_iters = 2500,
):
    D,P = matrix.shape
    beta = cp.Variable((P,D))  # could initialize with lasso?
    beta.value = np.asarray([[1.,0], [0,1.],[0,0],[0,0]])
    objective = cp.Minimize(cp.sum(cp.norm(beta, axis=1)))
    constraints = [matrix @ beta == np.identity(D)]
    problem = cp.Problem(objective, constraints)
    scs_opts = {"eps": eps, "max_iters": max_iters}
    output = problem.solve(solver=cp.SCS, **scs_opts)
#     output = problem.solve(solver= cp.CVXOPT)
    if output is np.inf:
        raise ValueError("No solution found")
    beta_optimized = beta.value
    beta_sparse = beta_optimized.copy()
    beta_sparse[np.abs(beta_sparse) < threshold] = 0
    return beta_sparse


np.random.seed(0)
D = 2
P = 4
data = np.random.multivariate_normal(np.zeros(P), np.identity(P), D)
data = data / np.linalg.norm(data, axis = 0)
data[:,:D] = np.identity(D)
beta = group_basis_pursuit(data, eps = 1e-16, max_iters = 100000)
selected_indices = np.where(np.linalg.norm(beta, axis = 1))[0]
print(len(selected_indices), "number of selected coefficients")
print(2 - group_lasso_norm(beta), "loss improvement over naive solution")
print(np.linalg.norm(np.identity(D) - data @ beta), "constraint violation")

Any help is appreciated. Maybe I should try a solver that isn't SCS? Thanks.

sjkoelle avatar Aug 08 '24 23:08 sjkoelle

It appears this is a form of non-uniqueness in basis pursuit. More about it here https://arxiv.org/abs/2411.18502.

sjkoelle avatar Nov 28 '24 04:11 sjkoelle