Piff
Piff copied to clipboard
regularize in a hacky way
We can use N-fold cross-validation to do hacky regularization. sklearn has a lot of nice tools for thus. More or less, you can use a GridSearchCV object to do this. How it works is that you split the data into N sections. You loop and leave one of them out, fit a model, then predict for the other one. At the very end, you combine all of the out-of-sample predictions.
With this technique, we can loop through a range of regularization amplitudes, run CV for each of them, and pick the one that has the minimum chi2 or w/e.
It fits a lot more models, but would do the trick.
That sounds like it would multiply the running time by a large factor, which seems probably untenable. Am I missing something?
How long is the running time now?
Yes in general it would.
Most other options I know of have similar costs or more.
You only need to do this for a representative subset fwiw. Then you can likely fix the regularization for the rest of the survey.