celer
celer copied to clipboard
ENH: slow solver on large scale problems with majority of features screened
Finance:
import time
import libsvmdata
import numpy as np
from numpy.linalg import norm
from celer import Lasso
X, y = libsvmdata.fetch_libsvm("finance", min_nnz=3)
alpha_max = norm(X.T @ y, ord=np.inf) / len(y)
t0 = time.time()
clf = Lasso(alpha=alpha_max/20, fit_intercept=False, verbose=True).fit(X, y)
dur = time.time() - t0
print(f"{dur:.2f} seconds")
The first feature is super correlated with y, the support is small. Lots of features are screened, the convergence should be way faster for later iterations, and it is not.
In [19]: t0 = time.time(); clf = Lasso(alpha=alpha_max/20, fit_intercept=False, verbose=True).fit(X, y); dur = time.time() - t0
#########################
##### Computing alpha 1/1
#########################
Iter 0: primal 6.3741726822, gap 5.75e+00, 10 feats in subpb (9089 left)
Iter 1: primal 0.8647719451, gap 7.29e-02, 4 feats in subpb (162 left)
Iter 2: primal 0.8227823469, gap 1.96e-02, 6 feats in subpb (53 left)
Iter 3: primal 0.8144988993, gap 5.66e-03, 4 feats in subpb (14 left)
Iter 4: primal 0.8132372683, gap 1.63e-03, 4 feats in subpb (8 left)
Iter 5: primal 0.8130029142, gap 4.61e-04, 4 feats in subpb (6 left)
Iter 6: primal 0.8129717566, gap 1.35e-04, 3 feats in subpb (3 left)
Iter 7: primal 0.8129684005, gap 3.84e-05
Early exit, gap: 3.84e-05 < 1.00e-04
@qb3 related to our work