cobra icon indicating copy to clipboard operation
cobra copied to clipboard

Automate selection of number of features

Open sandervh14 opened this issue 1 year ago • 0 comments

Task Title

Task: Automate selection of number of features

Task Description

Part of #134 but can be delivered independently and would already bring great value in allowing to re-fit models automatically.

cfr message to Nick of 21/3:

Hey Nick,

Benoît is right, you can just script one run of your model creation for future re-fits.

The only big hurdle indeed is the selection of parameters. If you fix the number of features to select to a reasonable number (for example, keep it equal to the currently​ selected number of features), the client can recurrently re-fit the model.

This solution is a bit sub-optimal though, in that new data or even new features may result in the model under- or over-fitting. The manual process we make as data scientists to select the optimal​ number of parameters though, is easy, just an elbow curve problem. Algorithms already exist to select the optimal number automatically - mainly on elbow curves for clustering (or you could implement a simple check yourself on how the slope against the elbow curve evolves at each newly added feature). This will could be solved on the project of the client you mention, +- 2 days of work.

sandervh14 avatar Mar 21 '23 16:03 sandervh14