censored
censored copied to clipboard
make sure all encodings are done correctly
Wait for #5, #6, #7, #8
All engines have a formula interface (at least we are telling parsnip that about glmnet), and most have the unsurprising encodings of predictor_indicators = "none", include_intercept = FALSE, and remove_intercept = FALSE, leaving the indicators and the intercept to the engine -- which is sensible for an engine with a formula interface.
The exceptions are bag_tree(engine = "rpart") and glmnet. The glmnet encodings for predictors and intercept are correct, the ones for the bagged tree should switch to predictor_indicators = "none".
In terms of sparsity:
- only glmnet allows that which is generally correct but might not be true for this case here, see #276
- the mboost package includes support for sparse matrices but not for
mboost::blackboost()which is what we are using - the rest do not, as far as I can tell
library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
mod_names <- get_from_env("models")
model_interface <-
purrr::map_dfr(mod_names, ~ get_from_env(paste0(.x, "_fit")) %>%
mutate(model = .x)) %>%
mutate(interface = map_chr(value, 1)) %>%
select(engine, mode, model, interface)
model_encodings <-
purrr::map_dfr(mod_names, ~ get_from_env(paste0(.x, "_encoding"))) %>%
#left_join(model_interface, by = join_by(model, engine, mode)) %>%
filter(mode == "censored regression")
model_encodings %>%
#group_by(interface) %>%
count(predictor_indicators, compute_intercept, remove_intercept, allow_sparse_x)
#> # A tibble: 3 × 5
#> predictor_indicators compute_intercept remove_intercept allow_sparse_x n
#> <chr> <lgl> <lgl> <lgl> <int>
#> 1 none FALSE FALSE FALSE 9
#> 2 traditional FALSE FALSE FALSE 1
#> 3 traditional TRUE TRUE TRUE 1
model_encodings %>%
filter(predictor_indicators == "traditional")
#> # A tibble: 2 × 7
#> model engine mode predictor_indicators compute_intercept remove_intercept
#> <chr> <chr> <chr> <chr> <lgl> <lgl>
#> 1 bag_tree rpart cens… traditional FALSE FALSE
#> 2 proporti… glmnet cens… traditional TRUE TRUE
#> # ℹ 1 more variable: allow_sparse_x <lgl>
model_encodings %>%
filter(allow_sparse_x)
#> # A tibble: 1 × 7
#> model engine mode predictor_indicators compute_intercept remove_intercept
#> <chr> <chr> <chr> <chr> <lgl> <lgl>
#> 1 proporti… glmnet cens… traditional TRUE TRUE
#> # ℹ 1 more variable: allow_sparse_x <lgl>
Created on 2024-01-10 with reprex v2.0.2
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.