juliasilge.com icon indicating copy to clipboard operation
juliasilge.com copied to clipboard

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge

Open utterances-bot opened this issue 1 year ago • 7 comments

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge

A data science blog

https://juliasilge.com/blog/spam-email/

utterances-bot avatar Nov 24 '23 19:11 utterances-bot

Hi Julia! First thank you so much for your work in this package and this blog. I can't emphasize how much your work has helped me grow my confidence in this area and most importantly made this fun!

Some of my colleagues use python and I can honestly say I am running circles around them because of the work you and the tidymodels team have done here. I'm a huge fan.

Quick question for you -- just as you have workflow_map() for fitting models to resamples and then we can we can view the results. Is there a similar way to to use workflow_map() on the testing data set?

While this may be counter the overall workflow / pipeline that I see in machine learning where we focus and tune the results against resamples of the testing set and then extract the best model and do a last_fit(), for one reason or another we will want to see to how the many models perform against the testing set.

Is there any way to do this with workflow_set() and workflow_map()?

alejandrohagan avatar Nov 24 '23 19:11 alejandrohagan

@alejandrohagan Thank you so much for the kind words! ❤️

There isn't currently an automatic way to use a workflow_set() with the testing set, mainly because we see a workflow_set() as something you do/use doing model development while the testing set is only used for confirming expected performance after you have chosen a final model. If you have a fitted workflow_set(), then you can use extract_workflow_set_result() to get out a specific fitted workflow and then do whatever you want with it, like predict() on the testing set.

juliasilge avatar Nov 26 '23 22:11 juliasilge

Hi Julia,

Thank you for your amazing work on the blog. Your efforts made my learning enjoyable!

I'm curious about the vip() function. When performing multiple modeling and wanting to determine the vip() for all models, should we extract the VI values from fit_resample or last_fit? Additionally, if we need to use it with workflow map fitting, how can we extract the workflow or parsnip from the process? Thank you for your help!

NizePetcharat avatar Jul 07 '24 17:07 NizePetcharat

@NizePetcharat If you want to use variable importance as part of your process of comparing and choosing a model, then I would do that with your resamples, yes. You might check out this Stack Overflow answer where I outline how to approach this.

juliasilge avatar Jul 08 '24 02:07 juliasilge

Hi,

If it turns out with formula_rf_tune is the best, how can we extract mtry etc for Train and evaluate final model?

NarainritKaruna avatar Jul 08 '24 21:07 NarainritKaruna

@NarainritKaruna Take a look at how you can use extract_workflow_set_result(): https://workflowsets.tidymodels.org/reference/extract_workflow_set_result.html

juliasilge avatar Jul 08 '24 21:07 juliasilge

Thanks Julia,

As I usually use select_best(), then I will get mtry and min_n. However, when use extract_workflow_set_result (spam_res,"formula_rf_tune") there is no parameters (mtry & min_n)

Edited I finally got it. Thanks by pulling from "results"

NarainritKaruna avatar Jul 08 '24 23:07 NarainritKaruna