sparse-ho
sparse-ho copied to clipboard
API: naming and functions
As we saw in #30 and with @ksehic, newcomer's life could be made easier.
I would like to discuss a few API changes:
- Naming: from my (personal) user perspective it's surprising to have
sparse_ho.models.Lassowhich has as an attribute aceler.LassoSincemodels.Lassosets the alpha, I would name this classLassoGradSearch, orLassoGSorLassoAuto, or something else (in a similar way as insklearn.linear_model.LassoCV) - exposition of estimator: in the newly named
LassoGradSearch,estimatorshould be made private. It's only called for its solver, and I don't think the user should be aware of it. - the user should not have to pass
estimatorwhen creating LassoGradSearch. It should be created in the init (if possible with sklearn convention, oterwise on the first call tofit). Conceptually, this makes the instanciation of the class easier. - LassoGradSearch should expose a
fitand apredict(so the user does not need to access itsestimator), and passsklearn.utils.estimators_check.check_estimator
@agramfort @josephsalmon what do you think ?
+100
Given the past work, I think it should be doable to have something along the lines of:
criterion = HeldOutMSE(idx_train, idx_val) # better yet, train_frac=0.75 if possible with CV
algo = ImplicitForward()
monitor_grad = Monitor() # this could go inside model too, I think ?
optimizer = LineSearch(n_outer=10)
model = LassoAuto(estimator=estimator, criterion=criterion, alpha=alpha0)
# in my ideal world we should not pass estimator, (skelarn LassoCV does not need sklearn Lasso) but with sklearn *
# conventions I don't know what's possible. Instanciate at first fit call, maybe, like sklearn does for coef_ ?
model.fit(X, y, algo=algo, optimizer=optimizer, search='grad', callback=None) # grad_search happens here, do we
# even need to specify it ?
# ie. is there an alternative ? for gridsearch, use sklearn
model.predict(X_test)
My 2 issues with the current design is that we have to instantiate 6 objects to fit (estimator, model, monitor, algo, criterion, optimizer), and that we must use estimator to predict in the end, while it should remain hidden from the user (it's not obvious that it's modified by grad_search)
Does it sound doable the way I propose, @QB3 @Klopfe ? this may help for refitting in #86
While we are discussing the API, it seems that instantiating HeldOutMSE with idx_train and idx_val is counter intuitive.
What do you think HeldOutMSE should take as argument? @agramfort @josephsalmon @Klopfe @mathurinm ?
what do you suggest?