juliasilge.com icon indicating copy to clipboard operation
juliasilge.com copied to clipboard

Create a custom metric with tidymodels and NYC Airbnb prices | Julia Silge

Open utterances-bot opened this issue 3 years ago • 17 comments

Create a custom metric with tidymodels and NYC Airbnb prices | Julia Silge

Predict prices for Airbnb listings in NYC with a data set from a recent episode of SLICED, with a focus on two specific aspects of this model analysis: creating a custom metric to evaluate the model and combining both tabular and unstructured text data in one model.

https://juliasilge.com/blog/nyc-airbnb/

utterances-bot avatar Jul 06 '21 16:07 utterances-bot

Nice post Julia, I've also followed the Slice episode and this is one of the most appreciated part. Are you plan to make a video on tune_bayes?

mfcava avatar Jul 06 '21 16:07 mfcava

@mfcava I'll look into that!

juliasilge avatar Jul 06 '21 16:07 juliasilge

Where can I get the data set for this? Thank you

ghost avatar Jul 13 '21 11:07 ghost

The link is in the post itself: Airbnb prices in New York City

juliasilge avatar Jul 13 '21 15:07 juliasilge

Hi,

I have a trouble, maybe I think this is from the tidymodels packages. I downloaded Tidymodels package from "CRAN" (install.packages("tidymodels")).

When trying to follow your code. I have an issue at

set.seed(123) bag_fit <- fit(bag_wf, data = nyc_train) bag_fit

And the error is

Error in UseMethod("filter") : no applicable method for 'filter' applied to an object of class "NULL"

Could you help me to fix this? Many thanks

nguyenlovesrpy avatar Aug 30 '21 06:08 nguyenlovesrpy

@nguyenlovesrpy Hmmmm, I'm not entirely sure based only that info; do you have the most recent updated version of baguette from CRAN?

juliasilge avatar Aug 30 '21 15:08 juliasilge

Hi Julia, thanks for these fantastic screencasts. I have a question regarding custom metrics, is it possible to build a metric using variables other than 'truth' and 'estimate'? I have searched tutorials/blogs but cannot find anything to guide me. Many thanks, Chris.

nealec avatar Nov 29 '21 20:11 nealec

@nealec I'm assuming you checked out this article already. The variables don't have to be named truth and estimate in your data (here in my blog post they are called price and .pred). The yardstick infrastructure for creating a new metric does depend on using the function metric_vec_template() and friends, but you can pass in different names for arguments if you need to. Notice this usage:

metric_vec_template(
    metric_impl = mse_impl,
    truth = truth, 
    estimate = estimate,
    na_rm = na_rm,
    cls = "numeric",
    ...
  )

juliasilge avatar Nov 29 '21 21:11 juliasilge

Thankyou for responding so swiftly Julia. I have indeed read that link it was very useful in getting the main part of the code written. What I am looking at is a metric that requires a truth, an estimate, variable x and variable y to calculate the metric.

If I may be so bold, would you be able to take a look at this link where I have posed the question with more detail; https://community.rstudio.com/t/tidymodels-custom-metric-for-multi-class-classification-yardstick-machine-learning/122648?u=nealec

Thankyou in advance, Chris.

nealec avatar Nov 29 '21 22:11 nealec

I don't understand why you used: test_rs <- augment(bag_fit, nyc_test)

Is it ok to use this? test_rs <- augment(bag_rs, nyc_test)

jrosell avatar Feb 07 '22 15:02 jrosell

@jrosell It is not OK, actually. 😬 The bag_rs object is not a fitted model workflow but instead is a whole tibble of resampling results. It has metrics for fitting the workflow to each of the resamples.

juliasilge avatar Feb 07 '22 16:02 juliasilge

Thanks @juliasilge. To get calculate a custom metric manually on rsampling results I've just seen this article https://rsample.tidymodels.org/articles/Applications/Recipes_and_rsample.html but I wonder if collect_metrics should work too on resampling using this new custom metric.

jrosell avatar Feb 08 '22 23:02 jrosell

@jrosell Yes, it definitely can! You will need to set a metric_set() for your resampling like in this post, with your custom metric in it.

juliasilge avatar Feb 09 '22 16:02 juliasilge

@juliasilge, Hi Julia, this screencast is absolutely interesting because I can learn a lot of new things in here, just come up with some question that I like to ask:

  1. Could you remind me the meaning of argument times=25 in set_engine("rpart", times = 25)?
  2. I have learned about rlang before but in many cases I have rarely seen people use this to write a function. Compare to classical way to write a function, what is the advantage of using rlang?

conlelevn avatar Jul 05 '22 02:07 conlelevn

@conlelevn You can check out the documentation for baguette to learn about what the arguments mean. As far as rlang, to create a custom metric, you write a function that needs to be able to take different variable names as arguments. I find a couple of resources helpful for this:

juliasilge avatar Jul 05 '22 17:07 juliasilge

Hello Dr. Silge, I tried to rerun your code but run into an issue with metric_vec_template, I kept getting this error because of the soft deprecation "metric_vec_template() has been soft-deprecated as of yardstick 1.2.0. Please switch to use check_metric and yardstick_remove_missing functions."

When I replace metric_vec_template with check_metric, it has no named step for "metric_impl". Is there additional material you can suggest that I can use to improve my functional programming?

Thanks

gunnergalactico avatar Apr 09 '24 00:04 gunnergalactico

@gunnergalactico Take a look at this documentation for some guidance on how to make a custom metric.

juliasilge avatar Apr 09 '24 10:04 juliasilge