juliasilge.com icon indicating copy to clipboard operation
juliasilge.com copied to clipboard

Multiclass predictive modeling for #TidyTuesday NBER papers | Julia Silge

Open utterances-bot opened this issue 3 years ago • 46 comments

Multiclass predictive modeling for #TidyTuesday NBER papers | Julia Silge

Tune and evaluate a multiclass model with lasso regulariztion for economics working papers.

https://juliasilge.com/blog/nber-papers/

utterances-bot avatar Sep 30 '21 08:09 utterances-bot

Thank you once again Julia for an excellent screencast!

I have recently stumbled upon the DALEX package for model agnostic and exploration and I was wondering if you at the tidymodels team have any particular plans in mind to incorporate some of the DALEX functionalities into the tidymodels meta-package framework? I like DALEX but one ting I feel is missing is the possibility of using tidyverse syntax in plotting, such as ggplot2. This would be really cool, as it is often very important for data scientists to effectively communicate the models created, and preferably in a visually appealing format.

All the best, Kamau

kamaulindhardt avatar Sep 30 '21 08:09 kamaulindhardt

https://github.com/ModelOriented/DALEX

kamaulindhardt avatar Sep 30 '21 08:09 kamaulindhardt

@kamaulindhardt Yep, we have a chapter in TMwR on how to use DALEX for explainability with tidymodels.

juliasilge avatar Sep 30 '21 13:09 juliasilge

Great blog as always Julia...I applied such code on my data and everything was doing well until the last part when I tried apply final_fitted <- extract_workflow(final_rs)....shouw up the following issue: "Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('last_fit', 'resample_results', 'tune_results', 'tbl_df', 'tbl', 'data.frame')"

Do you how to solve it? Thanks

Woprates avatar Sep 30 '21 19:09 Woprates

actually the issue was : Error in UseMethod("extract_workflow") : no applicable method for 'extract_workflow' applied to an object of class "c('last_fit', 'resample_results', 'tune_results', 'tbl_df', 'tbl', 'data.frame')"

Woprates avatar Sep 30 '21 19:09 Woprates

@Woprates Looks like you need to update some packages, probably tune and maybe workflows? I was using just CRAN versions here, I believe.

juliasilge avatar Sep 30 '21 19:09 juliasilge

Yeap....I have no idea what to do, because I did exactly the same thing and packages that you used too....my information about R is: platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.3
year 2020
month 10
day 10
svn rev 79318
language R
version.string R version 4.0.3 (2020-10-10) nickname Bunny-Wunnies Freak Out

Woprates avatar Sep 30 '21 19:09 Woprates

@Woprates I'd make sure that you have version 0.1.6 of tune, the latest version from back in July or so. You can check that in a couple of different ways, such as sessioninfo::session_info().

juliasilge avatar Sep 30 '21 20:09 juliasilge

Thanks Julia for your attention....actually the version is 0.1.3 textrecipes * 0.4.1 2021-07-11 [1] CRAN (R 4.0.5) themis * 0.1.4 2021-06-12 [1] CRAN (R 4.0.5) tidylo * 0.1.0 2020-05-25 [1] CRAN (R 4.0.5) tidymodels * 0.1.3 2021-04-19 [1] CRAN (R 4.0.5) tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.0.5) tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5) tidytext * 0.3.1 2021-04-10 [1] CRAN (R 4.0.5) tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.0.5) tokenizers 0.2.1 2018-03-29 [1] CRAN (R 4.0.3) tune * 0.1.3 2021-02-28 [1] CRAN (R 4.0.4) tzdb 0.1.2 2021-07-20 [1] CRAN (R 4.0.5) unbalanced 2.0 2015-06-26 [1] CRAN (R 4.0.5) workflows * 0.2.3 2021-07-16 [1] CRAN (R 4.0.5) workflowsets * 0.1.0 2021-07-22 [1] CRAN (R 4.0.5)

Woprates avatar Sep 30 '21 20:09 Woprates

@Woprates Yep, looks like you need to update.packages()

juliasilge avatar Sep 30 '21 21:09 juliasilge

Thanks a lot Julia...now it's working....you rock...:)

Woprates avatar Sep 30 '21 21:09 Woprates

autoplot(nber_rs) show_best(nber_rs) when i run this two code my rstudio crashes(iam using 16 ram winos 64bit)

SIRIYAK avatar Oct 01 '21 07:10 SIRIYAK

@SIRIYAK That sounds frustrating! Can you create a reprex (a minimal reproducible example) for this and post your problem on the repo for tune? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it.

If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. You may already have reprex installed (it comes with the tidyverse package), but if not you can install it with:

install.packages("reprex")

juliasilge avatar Oct 01 '21 16:10 juliasilge

Also below <<final_fitted <- extract_workflow(final_rs)>> Error in extract_workflow(final_rs) : could not find function "extract_workflow"

SIRIYAK avatar Oct 01 '21 16:10 SIRIYAK

thanks a lot

On Fri, Oct 1, 2021 at 9:30 PM Julia Silge @.***> wrote:

@SIRIYAK https://github.com/SIRIYAK That sounds frustrating! Can you create a reprex https://reprex.tidyverse.org/ (a minimal reproducible example) for this and post your problem on the repo for tune https://github.com/tidymodels/tune/issues? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it.

If you've never heard of a reprex before, you may want to start with the tidyverse.org help https://www.tidyverse.org/help/ page. You may already have reprex installed (it comes with the tidyverse package), but if not you can install it with:

install.packages("reprex")

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliasilge/juliasilge.com/issues/52#issuecomment-932354745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIMCYMUGEWWXL2GPCYDUT3UEXLLRANCNFSM5FBX4QZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

SIRIYAK avatar Oct 01 '21 16:10 SIRIYAK

@SIRIYAK If you notice the comments above, I think you need to do the same thing -- update to more recent package versions via update.packages().

juliasilge avatar Oct 01 '21 16:10 juliasilge

iam using encrypted SSD >>updating things is like nightmare>>i tried to update few still same error in r cloud also same error , may be i have try in different machine any way thank👍

SIRIYAK avatar Oct 01 '21 16:10 SIRIYAK

Julia, your blog and YouTube tutorials are incredible. I have been learning so much since I found your blog and channel. I was trying a more traditional approach using multinom and really struggling to get all the metrics. Yardstick package was a god sent, it worked like a charm. Thanks so much!

IvanDesuo avatar Oct 02 '21 01:10 IvanDesuo

Hello Julia. First of all, I wanted to congratulate you but above all thank you for the impressive work you do. I have followed several of your tutorials which have boosted my productivity. However, I would like to submit a question to you in the context of model training. I have a very large database. Suddenly it is very difficult to train my model since I have a RAM memory of 8g. Do you have a parallel computing method or other method to solve volumetry problems? Thank you!

SN4AI avatar Oct 05 '21 12:10 SN4AI

@SN4AI Unfortunately parallel processing won't solve any problems with running out of memory. Running in parallel typically requires more memory than running sequentially (but it is of course faster). So if you are very, very low on RAM, I recommend only running sequentially and probably using just a subsample of your data for training your model. If your data is in a database, see if you can do any summarizing in the database itself before bringing just the minimum of data into memory in R locally. Also, if you are using a database, look into whether something like tidypredict will work for you.

juliasilge avatar Oct 05 '21 13:10 juliasilge

Hello Julia, thank you very much for answering my question. Your answer is more than clear. If not, do you think that increasing my RAM memory from 8G to 16G will have an impact on my ability to train models. In the end, thank you for suggesting 'tidyprdict' to me, I'm documenting myself on this. Thank you!!

SN4AI avatar Oct 06 '21 15:10 SN4AI

@SN4AI I have 16 GB of RAM on my main computer and I seldom have problems running out of memory when training models, but this really depends on the particulars of your datasets. The other option you can consider is moving to the cloud (like AWS or similar) for training models.

juliasilge avatar Oct 06 '21 17:10 juliasilge

Hi Julia, thanks a lot for this interesting blog!

Would it be possible to use the weighted log odds instead of the tfidf as feature?

fvr1210 avatar Oct 07 '21 15:10 fvr1210

@fvr1210 You definitely could! We haven't implemented that in recipes or textrecipes yet but you could either create a custom recipe step or submit an issue on one of those repos.

juliasilge avatar Oct 07 '21 16:10 juliasilge

Hey Julia, sorry bother you...but I got such issue in my model to predict a class using form comments....when I run nber_rs <- tune_grid( nber_wf, nber_folds, grid = nber_grid ) Warning message: All models failed. See the .notes column.

The .notes erros are:

[[8]]

A tibble: 1 x 1

.notes

1 preprocessor 1/1, model 1/1: Error in lognet(xd, is.sparse, ix, jx, ~

[[9]]

A tibble: 1 x 1

.notes

1 preprocessor 1/1, model 1/1: Error in lognet(xd, is.sparse, ix, jx, ~

Do you have any idea how to solve this?

Woprates avatar Oct 12 '21 20:10 Woprates

@Woprates Hmmm, that error looks like glmnet did not get all numeric values. Do you still have some factor/string variables? You'll need to convert those into indicator variables, probably using step_dummy(). Or maybe the text hasn't been tokenized and prepared?

juliasilge avatar Oct 12 '21 23:10 juliasilge

Actually I have factor variables (the outcome + 2 predictors) and one predictor is a text (character).

I am trying to predict the class of the outcome based on those 2 factor variables and 1 character variable (the text).

On Tue, Oct 12, 2021, 19:20 Julia Silge @.***> wrote:

@Woprates https://github.com/Woprates Hmmm, that error looks like glmnet did not get all numeric values. Do you still have some factor/string variables? You'll need to convert those into indicator variables, probably using step_dummy(). Or maybe the text hasn't been tokenized and prepared?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliasilge/juliasilge.com/issues/52#issuecomment-941736574, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKIZO46UU2MUQOTDJYYUXTUGS7CJANCNFSM5FBX4QZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Woprates avatar Oct 13 '21 01:10 Woprates

@Woprates A glmnet model needs all predictors to be in a numeric format so double check that you are converting your factor to a dummy variable and that your text is all tokenized and converted.

juliasilge avatar Oct 13 '21 13:10 juliasilge

Done

Ji-square avatar Oct 15 '21 01:10 Ji-square

Great job as always. I use these videos to provide a solid background in my modelling especially using the tidymodels met package. Is there a video on inferential statistics where one may one want to see the effect of a variable say on a given outcome?

mosesotieno avatar Oct 15 '21 19:10 mosesotieno