yardstick
yardstick copied to clipboard
Lift/gain at a certain percentage
I saw an interesting presentation where they optimized the lift statistic but for a fixed percentage of samples that are tested.
So for the example data, if someone used gain_point(two_class_example, pct = 1/2, truth, Class1)
(or whatever we call it), they would get back an estimate of 84.5:
library(tidymodels)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────── tidymodels 0.0.2 ──
#> ✔ broom 0.5.1 ✔ purrr 0.3.2
#> ✔ dials 0.0.2 ✔ recipes 0.1.5
#> ✔ dplyr 0.8.0.1 ✔ rsample 0.0.4
#> ✔ ggplot2 3.1.1 ✔ tibble 2.1.1
#> ✔ infer 0.4.0 ✔ yardstick 0.0.2
#> ✔ parsnip 0.0.2
#> ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ recipes::step() masks stats::step()
gain_curve(two_class_example, truth, Class1) %>% slice(251)
#> # A tibble: 1 x 4
#> .n .n_events .percent_tested .percent_found
#> <dbl> <dbl> <dbl> <dbl>
#> 1 250 218 50 84.5
Created on 2019-05-13 by the reprex package (v0.2.1)
We can interpolate between points if we don't have the exact percentage.
lift_point()
would return the ratio of the observed percent over the baseline rate (e.g. prevalence)
@topepo in your example did you mean to say pct = 1/2
? so we look for where it matches .percent_tested
?