Text predictors for #TidyTuesday chocolate ratings | Julia Silge

A data science blog

https://juliasilge.com/blog/chocolate-ratings/

Feb 16 '22 20:02 utterances-bot

Julia, Excellent work as ways! I love to learn from your tutorials! Quick question, can you give me some pointer on how to include the step_lemma on the most_memorable_characteristics? I'm getting the following error o "Error in bake(): ! most_memorable_characteristics doesn't have a lemma attribute. Make sure the tokenization step includes lemmatization."

chocolate_recipe <- recipe(rating ~ most_memorable_characteristics + country_of_bean_origin,
       data = chocolate_train) %>%
  step_tokenize(most_memorable_characteristics) %>%
  step_lemma(most_memorable_characteristics)
  step_tokenfilter(most_memorable_characteristics, max_tokens = 100) %>%
   step_tfidf(most_memorable_characteristics) %>%
  step_tokenize(country_of_bean_origin) %>%
  step_tokenfilter(country_of_bean_origin, max_tokens = 20) %>%
  step_tfidf(country_of_bean_origin)

Thank you for your work and for your time!

Best, Renato Albolea

Feb 16 '22 20:02 albolea

@albolea You'll need to use a tokenization engine that supports lemmas, such as engine = "spacyr". Check out the examples here to see how that will work.

Feb 16 '22 21:02 juliasilge

Hi Julia, thanks for this.

Out of curiosity, would an SVM model work on repeated data? For example, a reflection diary by an athlete with keywords to describe successes of the day, paired with a rating value of how well they would rate that day's (training) activities.

Greatly appreciate your time.

Apr 27 '22 11:04 hareshsuppiah

@hareshsuppiah I believe most folks would use a multilevel (i.e. mixed effects or hierarchical) model with that kind of data, like what multilevelmod supports.

Apr 27 '22 14:04 juliasilge

Thank you, @juliasilge !

Apr 28 '22 06:04 hareshsuppiah

Great tutorials, @juliasilge! As I was following the codes, I got an error while evaluating models. the error shows "All models failed. See the '.notes' column."

Jul 01 '22 18:07 zabeelbasheer

When I checked the collect_notes() function, it gives the note as "Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_tokenize', 'step')"

Jul 01 '22 18:07 zabeelbasheer

@zabeelbasheer It sounds like either you have very old versions of recipes and/or textrecipes, or that perhaps textrecipes isn't loaded or similar? If you keep having problems, I recommend that you create a reprex (a minimal reproducible example) for this. The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of modeling questions. Thanks! 🙌

Jul 01 '22 18:07 juliasilge

Thank you, @juliasilge! I am excited that I am learning yet another tidyverse function - reprex. I will check with the RStudio community later.

Thank you!

Jul 01 '22 18:07 zabeelbasheer

Hi Julia, Thank you for these tutorials as well as for your book with Emil! The book is an excellent explanation! This said, I have one question. Is there an 'easy' way to get the outputs from keras-based models in the book into a package like IML to calculate global feature importance? I am 'stuck' so any guidance would be appreciated!

Dec 01 '22 14:12 neuflaneur

@neuflaneur For models built with keras that don't have direct model-based global feature importance, I would suggest using something like DALEX for model-agnostic explainability. You can read more in this chapter of Tidy Modeling with R.

Dec 01 '22 17:12 juliasilge

Hi Julia,

Thank you!

Dean Neu

Sent from Proton Mail for iOS

On Thu, Dec 1, 2022 at 12:21 PM, Julia Silge @.***> wrote:

@.***(https://github.com/neuflaneur) For models built with keras that don't have direct model-based global feature importance, I would suggest using something like DALEX for model-agnostic explainability. You can read more in this chapter of Tidy Modeling with R.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Dec 02 '22 00:12 neuflaneur

Hiya Julia, There's this NLP kaggle hackathon that finished recently. I've submitted a basic tidymodels (and tidytext) notebook but I'm sure there's heaps of techniques I'm not taking advantage of. If you had any spare time, I reckon it could make an excellent screencast/blogpost. https://www.kaggle.com/code/juliantagell/competition-attempt-1

Jul 07 '24 03:07 Tadge-Analytics

juliasilge.com juliasilge.com copied to clipboard

Text predictors for #TidyTuesday chocolate ratings | Julia Silge

Text predictors for #TidyTuesday chocolate ratings | Julia Silge

juliasilge.com
juliasilge.com copied to clipboard