Piff icon indicating copy to clipboard operation
Piff copied to clipboard

regularize in a hacky way

Open beckermr opened this issue 3 years ago • 3 comments

We can use N-fold cross-validation to do hacky regularization. sklearn has a lot of nice tools for thus. More or less, you can use a GridSearchCV object to do this. How it works is that you split the data into N sections. You loop and leave one of them out, fit a model, then predict for the other one. At the very end, you combine all of the out-of-sample predictions.

With this technique, we can loop through a range of regularization amplitudes, run CV for each of them, and pick the one that has the minimum chi2 or w/e.

It fits a lot more models, but would do the trick.

beckermr avatar Nov 05 '21 21:11 beckermr

That sounds like it would multiply the running time by a large factor, which seems probably untenable. Am I missing something?

rmjarvis avatar Nov 05 '21 22:11 rmjarvis

How long is the running time now?

Yes in general it would.

Most other options I know of have similar costs or more.

beckermr avatar Nov 05 '21 22:11 beckermr

You only need to do this for a representative subset fwiw. Then you can likely fix the regularization for the rest of the survey.

beckermr avatar Nov 05 '21 22:11 beckermr