butcher icon indicating copy to clipboard operation
butcher copied to clipboard

Consider a different style of butchering for use cases that prioritize size of model object

Open juliasilge opened this issue 7 months ago • 1 comments

In rstudio/vetiver-r#264 @lschneiderbauer pointed out for their use case, they would like to remove more components. They don't need the components used for prediction/confidence intervals, and do need the model object to be smaller. They would like something along these lines:

library(butcher)
library(vetiver)

more_cars <- mtcars[rep(1:32, each = 1e4),]
cars_lm <- lm(mpg ~ ., data = more_cars)
weigh(cars_lm)
#> # A tibble: 25 × 2
#>    object         size
#>    <chr>         <dbl>
#>  1 qr.qr         54.0 
#>  2 residuals     28.4 
#>  3 fitted.values 28.4 
#>  4 effects        5.12
#>  5 model.mpg      2.56
#>  6 model.cyl      2.56
#>  7 model.disp     2.56
#>  8 model.hp       2.56
#>  9 model.drat     2.56
#> 10 model.wt       2.56
#> # ℹ 15 more rows

axe_custom <- function(x) {
    old <- x
    x <- butcher:::exchange(x, "residuals", numeric(0))
    x$qr <- butcher:::exchange(x$qr, "qr", matrix(0))
    x
}

axed_lm <- axe_custom(cars_lm)
weigh(axed_lm)
#> # A tibble: 25 × 2
#>    object         size
#>    <chr>         <dbl>
#>  1 fitted.values 28.4 
#>  2 effects        5.12
#>  3 model.mpg      2.56
#>  4 model.cyl      2.56
#>  5 model.disp     2.56
#>  6 model.hp       2.56
#>  7 model.drat     2.56
#>  8 model.wt       2.56
#>  9 model.qsec     2.56
#> 10 model.vs       2.56
#> # ℹ 15 more rows

v <- vetiver_model(axed_lm, "custom-butchered-lm")
weigh(v)
#> # A tibble: 37 × 2
#>    object            size
#>    <chr>            <dbl>
#>  1 model.effects     5.12
#>  2 model.model.mpg   2.56
#>  3 model.model.cyl   2.56
#>  4 model.model.disp  2.56
#>  5 model.model.hp    2.56
#>  6 model.model.drat  2.56
#>  7 model.model.wt    2.56
#>  8 model.model.qsec  2.56
#>  9 model.model.vs    2.56
#> 10 model.model.am    2.56
#> # ℹ 27 more rows

Created on 2023-11-30 with reprex v2.0.2

Should we consider a different style of butchering that prioritizes simple predictions only and throws out big components like the ones used for confidence/prediction intervals?

juliasilge avatar Nov 30 '23 18:11 juliasilge