juliasilge.com
juliasilge.com copied to clipboard
Text predictors for #TidyTuesday chocolate ratings | Julia Silge
Text predictors for #TidyTuesday chocolate ratings | Julia Silge
A data science blog
Julia,
Excellent work as ways! I love to learn from your tutorials!
Quick question, can you give me some pointer on how to include the step_lemma on the most_memorable_characteristics?
I'm getting the following error o
"Error in bake()
:
! most_memorable_characteristics
doesn't have a lemma attribute. Make sure the tokenization step includes lemmatization."
chocolate_recipe <- recipe(rating ~ most_memorable_characteristics + country_of_bean_origin,
data = chocolate_train) %>%
step_tokenize(most_memorable_characteristics) %>%
step_lemma(most_memorable_characteristics)
step_tokenfilter(most_memorable_characteristics, max_tokens = 100) %>%
step_tfidf(most_memorable_characteristics) %>%
step_tokenize(country_of_bean_origin) %>%
step_tokenfilter(country_of_bean_origin, max_tokens = 20) %>%
step_tfidf(country_of_bean_origin)
Thank you for your work and for your time!
Best, Renato Albolea
@albolea You'll need to use a tokenization engine that supports lemmas, such as engine = "spacyr"
. Check out the examples here to see how that will work.
Hi Julia, thanks for this.
Out of curiosity, would an SVM model work on repeated data? For example, a reflection diary by an athlete with keywords to describe successes of the day, paired with a rating value of how well they would rate that day's (training) activities.
Greatly appreciate your time.
@hareshsuppiah I believe most folks would use a multilevel (i.e. mixed effects or hierarchical) model with that kind of data, like what multilevelmod supports.
Thank you, @juliasilge !
Great tutorials, @juliasilge! As I was following the codes, I got an error while evaluating models. the error shows "All models failed. See the '.notes' column."
When I checked the collect_notes() function, it gives the note as "Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_tokenize', 'step')"
@zabeelbasheer It sounds like either you have very old versions of recipes and/or textrecipes, or that perhaps textrecipes isn't loaded or similar? If you keep having problems, I recommend that you create a reprex (a minimal reproducible example) for this. The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of modeling questions. Thanks! 🙌
Thank you, @juliasilge! I am excited that I am learning yet another tidyverse function - reprex. I will check with the RStudio community later.
Thank you!
Hi Julia, Thank you for these tutorials as well as for your book with Emil! The book is an excellent explanation! This said, I have one question. Is there an 'easy' way to get the outputs from keras-based models in the book into a package like IML to calculate global feature importance? I am 'stuck' so any guidance would be appreciated!
@neuflaneur For models built with keras that don't have direct model-based global feature importance, I would suggest using something like DALEX for model-agnostic explainability. You can read more in this chapter of Tidy Modeling with R.
Hi Julia,
Thank you!
Dean Neu
Sent from Proton Mail for iOS
On Thu, Dec 1, 2022 at 12:21 PM, Julia Silge @.***> wrote:
@.***(https://github.com/neuflaneur) For models built with keras that don't have direct model-based global feature importance, I would suggest using something like DALEX for model-agnostic explainability. You can read more in this chapter of Tidy Modeling with R.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hiya Julia, There's this NLP kaggle hackathon that finished recently. I've submitted a basic tidymodels (and tidytext) notebook but I'm sure there's heaps of techniques I'm not taking advantage of. If you had any spare time, I reckon it could make an excellent screencast/blogpost. https://www.kaggle.com/code/juliantagell/competition-attempt-1