skglm icon indicating copy to clipboard operation
skglm copied to clipboard

ENH - check ``datafit + penalty`` compatibility with solver

Open PABannier opened this issue 2 years ago • 9 comments

A quick proof-of-concept of a function that checks if the combination (solver, datafit, penalty) is supported. Currently we have some edge cases where one can pass ProxNewton solver with L0_5 penalty without any error being raised.

Pros of this design: the validation rules are centralized and validating a 3-uple is a one-liner in glm_fit. Cons: we have to update the rules as we enhance the capabilities of the solver.

All in all, I think it is very valuable to have more verbose errors when fitting estimators (e.g. Ali Rahimi initially passed a combination Quadratic, L2_3, ProxNewton which cannot be optimized at the moment of writing).

Closes #101 Closes #90 Closes #109

PABannier avatar Dec 10 '22 18:12 PABannier

With this PR, the errors are more verbose:

In [1]: from skglm.estimators import GeneralizedLinearEstimator
           from skglm.penalties import L0_5
           from skglm.datafits import Quadratic, Logistic
           from skglm.solvers import ProxNewton, AndersonCD
           import numpy as np

In [2]: X = np.random.normal(0, 1, (30, 50))
           y = np.random.normal(0, 1, (30,))

In [3]: clf = GeneralizedLinearEstimator(Quadratic(), L0_5(1.), ProxNewton())

In [4]: clf.fit(X, y)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 clf.fit(X, y)

File ~/Documents/skglm/skglm/estimators.py:241, in GeneralizedLinearEstimator.fit(self, X, y)
    238 self.datafit = self.datafit if self.datafit else Quadratic()
    239 self.solver = self.solver if self.solver else AndersonCD()
--> 241 return _glm_fit(X, y, self, self.datafit, self.penalty, self.solver)

File ~/Documents/skglm/skglm/estimators.py:29, in _glm_fit(X, y, model, datafit, penalty, solver)
     27 is_classif = isinstance(datafit, (Logistic, QuadraticSVC))
     28 fit_intercept = solver.fit_intercept
---> 29 validate_solver(solver, datafit, penalty)
     31 if is_classif:
     32     check_classification_targets(y)

File ~/Documents/skglm/skglm/utils/dispatcher.py:21, in validate_solver(solver, datafit, penalty)
      6 """Ensure the solver is suited for the `datafit` + `penalty` problem.
      7
      8 Parameters
   (...)
     17     Penalty.
     18 """
     19 if (isinstance(solver, ProxNewton)
     20     and not set(("raw_grad", "raw_hessian")) <= set(dir(datafit))):
---> 21     raise Exception(
     22         f"ProwNewton cannot optimize {datafit.__class__.__name__}, since `raw_grad`"
     23         " and `raw_hessian` are not implemented.")
     24 if ("ws_strategy" in dir(solver) and solver.ws_strategy == "subdiff"
     25     and isinstance(penalty, (L0_5, L2_3))):
     26     raise Exception(
     27         "ws_strategy=`subdiff` is not available for Lp penalties (p < 1). "
     28         "Set ws_strategy to `fixpoint`.")

Exception: ProwNewton cannot optimize Quadratic, since `raw_grad` and `raw_hessian` are not implemented.

PABannier avatar Dec 10 '22 18:12 PABannier

Looks nice @PABannier, this will definitely improve UX!

From an API point of view, shouldn't this check be delegated to each solver? This way we don't have one big function, but Solver.validate(datafit, penalty), in the spirit of what @Badr-MOUFAD implemented here : https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/experimental/pdcd_ws.py#L201

Such functions could also take care of the initialization (e.g. stepsize computation) which is done on a solver basis. WDYT?

mathurinm avatar Dec 12 '22 09:12 mathurinm

@mathurinm Yes I think it's cleaner, currently refining the POC.

PABannier avatar Jan 07 '23 12:01 PABannier

This would be a nice addition if we can ship it in the 0.3 release @Badr-MOUFAD , given that we added a few datafits, penalties and solvers !

mathurinm avatar Jun 20 '23 14:06 mathurinm

@Badr-MOUFAD the issue popped up in #188, do you have time to take this over ? A simple check, at the beginning of each solver, that the datafit and penalty are supported (eg AndersonCD does not support Gamma datafit)

mathurinm avatar Oct 06 '23 13:10 mathurinm

@Badr-MOUFAD the issue popped up in https://github.com/scikit-learn-contrib/skglm/issues/188, do you have time to take this over ? A simple check, at the beginning of each solver, that the datafit and penalty are supported (eg AndersonCD does not support Gamma datafit)

Sure, I will resume this PR.

Badr-MOUFAD avatar Oct 18 '23 08:10 Badr-MOUFAD

Requires #191 to be implemented fit to allow for better checks

mathurinm avatar Oct 24 '23 08:10 mathurinm

Trying to revive this to release v0.4. Some comments for discussion @QB3 @Badr-MOUFAD

  • I feel strongly against having both solver.solve() and solver(). "There should be one-- and preferably only one --obvious way to do it". The way I see it, all solve methods should be renamed _solve, and BaseSolver should implement its solve that does the checks then call _solve. What do you think? This is non-breaking from the API point of view, just requires people implementing their own solver to adapt (I have no such example in mind ; we can implement a test that no Solver implements solve by checking that A.solve == BaseSolver.solve) BaseSolver.solve can have a run_checks method which is True by default, and disabled if one wants to be faster (any idea on the impact of the checks @Badr-MOUFAD ?)
  • I feel like checking for the sparse support is heavy at the moment and could be done in another PR, but that may just be me :)
  • I have mixed feelings for the name check_obj_solver_attr; see my comment, I'd go for check_attr and not pass solver, which is used only for its name.
  • custom_compatibility_check could be custom_checks (?)

mathurinm avatar Apr 11 '24 09:04 mathurinm

  • I’m +1 with having _solve method and adding run_checks argument to solve method. I feel it give us more freedom to standardize the behavior and make less verbose the solve method. I don’t think the check have a big overhead, though I didn’t check that in practice
  • The support of sparse data is covered in check_obj_solver_attr function, so I don’t think it hurts us to cover it in this PR
  • I have no strong opion about the names, I’m +1 with your proposed name @mathurinm

Badr-MOUFAD avatar May 24 '24 11:05 Badr-MOUFAD

I think the error messages help a lot. Suggestion: add datafit/penalty and sovler

from

AttributeError: Poisson is not compatible with AndersonCD. It must implement `get_lipschitz` and `gradient_scalar`
Missing `get_lipschitz`.

to

AttributeError: Datafit `Poisson` is not compatible with solver `AndersonCD`. It must implement `get_lipschitz` and `gradient_scalar`
Missing `get_lipschitz`.

EDIT: I implemented it for solver ; to distinguis between if obj is a df or a pen we can have two arguments in check_obj_solver_attr equal to None by default, and infer from here:

def check_obj_solver_attr(solver, required_attr, df=None, pen=None,support_sparse=False):
    if df is None:
        name = "penalty" 
    elif pen is None:
        name = "datafit" 
    else: 
        raise Value Error ("Can't pass both a penalty and a datafit")

mathurinm avatar May 30 '24 14:05 mathurinm

  • I’m +1 with having _solve method and adding run_checks argument to solve method. I feel it give us more freedom to standardize the behavior and make less verbose the solve method. I don’t think the check have a big overhead, though I didn’t check that in practice

Let's perform checks all the time for now, this will simplify our lives. WDYT?

  • The support of sparse data is covered in check_obj_solver_attr function, so I don’t think it hurts us to cover it in this PR

OK

  • I have no strong opion about the names, I’m +1 with your proposed name @mathurinm

@QB3 any opinion?

mathurinm avatar May 30 '24 16:05 mathurinm