modeltime
modeltime copied to clipboard
Feature request - Refit model using X most important variables
I just saw this video from the Catboost team regarding using variable importance to eliminate noicy variables (those who are not important) and by doing this, reducing the error.
Got me thinking if it's possible to implement this in the modeltime package when you refit a model, you could pass the function a number, say 20, to refit the model using only the top 20 variables accorinding to the variable importance calculations.
Does this make sense?
We'd need to develop a feature importance capability first, which I do see value in just for model explainability. Especially since XGBoost can provide this, and it's already a dependency to Modeltime.
So let me think about this.
+1 Would be nice for the smooth extension too.
++1
Sent from my iPhone www.spsanderson.com
On Jan 21, 2022, at 5:20 PM, John Rambo @.***> wrote:
+1
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.
Hi @vidarsumo ,
You can also try https://github.com/stevenpawley/colino for feature selection. I have been using it in my recipes with quite good results. In fact, if you look at the functions under the step_select_ you can use them to get different scores for your variables and for example get the best score by grouping. Something I usually do is to apply certain lags/differentiations to each of the variables and to keep the one with the best score for each of the "original" variables. (For example, I apply lags to the unemployment rate and see which one has the best score to keep only one of those variables). Then I keep, for example, the two or three best ones and I try different combinations according to the different scores in a worfklowset.
I think that this type of thing is better to go in separate packages because they can have a certain scope and in the end they can be collected in an appropriate way in pre-processing steps, for example through recipes.
It seems more important to me to develop a capacity, for example, like the one mentioned in the issue https://github.com/business-science/modeltime/issues/108
Regards,