sparse-ho icon indicating copy to clipboard operation
sparse-ho copied to clipboard

API: naming and functions

Open mathurinm opened this issue 5 years ago • 4 comments

As we saw in #30 and with @ksehic, newcomer's life could be made easier.

I would like to discuss a few API changes:

  • Naming: from my (personal) user perspective it's surprising to have sparse_ho.models.Lasso which has as an attribute a celer.Lasso Since models.Lasso sets the alpha, I would name this class LassoGradSearch, or LassoGS or LassoAuto, or something else (in a similar way as in sklearn.linear_model.LassoCV)
  • exposition of estimator: in the newly named LassoGradSearch, estimator should be made private. It's only called for its solver, and I don't think the user should be aware of it.
  • the user should not have to pass estimator when creating LassoGradSearch. It should be created in the init (if possible with sklearn convention, oterwise on the first call to fit). Conceptually, this makes the instanciation of the class easier.
  • LassoGradSearch should expose a fit and a predict (so the user does not need to access its estimator), and pass sklearn.utils.estimators_check.check_estimator

@agramfort @josephsalmon what do you think ?

mathurinm avatar Nov 09 '20 08:11 mathurinm

+100

agramfort avatar Nov 09 '20 20:11 agramfort

Given the past work, I think it should be doable to have something along the lines of:

criterion = HeldOutMSE(idx_train, idx_val)  # better yet, train_frac=0.75 if possible with CV
algo = ImplicitForward()
monitor_grad = Monitor()  # this could go inside model too, I think ? 
optimizer = LineSearch(n_outer=10)
model = LassoAuto(estimator=estimator, criterion=criterion, alpha=alpha0)
# in my ideal world we should not pass estimator, (skelarn LassoCV does not need sklearn Lasso) but with sklearn *
# conventions I don't know what's possible. Instanciate at first fit call, maybe, like sklearn does for coef_ ?
model.fit(X, y, algo=algo, optimizer=optimizer, search='grad', callback=None)  # grad_search happens here, do we
#  even need to specify it ? 
# ie. is there an alternative ? for gridsearch, use sklearn
model.predict(X_test)

My 2 issues with the current design is that we have to instantiate 6 objects to fit (estimator, model, monitor, algo, criterion, optimizer), and that we must use estimator to predict in the end, while it should remain hidden from the user (it's not obvious that it's modified by grad_search)

Does it sound doable the way I propose, @QB3 @Klopfe ? this may help for refitting in #86

mathurinm avatar Jan 26 '21 14:01 mathurinm

While we are discussing the API, it seems that instantiating HeldOutMSE with idx_train and idx_val is counter intuitive. What do you think HeldOutMSE should take as argument? @agramfort @josephsalmon @Klopfe @mathurinm ?

QB3 avatar Jan 26 '21 15:01 QB3

what do you suggest?

agramfort avatar Jan 26 '21 15:01 agramfort