censored icon indicating copy to clipboard operation
censored copied to clipboard

make sure all encodings are done correctly

Open EmilHvitfeldt opened this issue 5 years ago • 1 comments
trafficstars

EmilHvitfeldt avatar Aug 23 '20 16:08 EmilHvitfeldt

Wait for #5, #6, #7, #8

EmilHvitfeldt avatar Oct 25 '20 22:10 EmilHvitfeldt

All engines have a formula interface (at least we are telling parsnip that about glmnet), and most have the unsurprising encodings of predictor_indicators = "none", include_intercept = FALSE, and remove_intercept = FALSE, leaving the indicators and the intercept to the engine -- which is sensible for an engine with a formula interface. The exceptions are bag_tree(engine = "rpart") and glmnet. The glmnet encodings for predictors and intercept are correct, the ones for the bagged tree should switch to predictor_indicators = "none".

In terms of sparsity:

  • only glmnet allows that which is generally correct but might not be true for this case here, see #276
  • the mboost package includes support for sparse matrices but not for mboost::blackboost() which is what we are using
  • the rest do not, as far as I can tell
library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

mod_names <- get_from_env("models")

model_interface <-
  purrr::map_dfr(mod_names, ~ get_from_env(paste0(.x, "_fit")) %>%
                   mutate(model = .x)) %>% 
  mutate(interface = map_chr(value, 1)) %>% 
  select(engine, mode, model, interface)

model_encodings <-
  purrr::map_dfr(mod_names, ~ get_from_env(paste0(.x, "_encoding"))) %>% 
  #left_join(model_interface, by = join_by(model, engine, mode)) %>% 
  filter(mode == "censored regression") 

model_encodings %>% 
  #group_by(interface) %>%
  count(predictor_indicators, compute_intercept, remove_intercept, allow_sparse_x)
#> # A tibble: 3 × 5
#>   predictor_indicators compute_intercept remove_intercept allow_sparse_x     n
#>   <chr>                <lgl>             <lgl>            <lgl>          <int>
#> 1 none                 FALSE             FALSE            FALSE              9
#> 2 traditional          FALSE             FALSE            FALSE              1
#> 3 traditional          TRUE              TRUE             TRUE               1

model_encodings %>% 
  filter(predictor_indicators == "traditional")
#> # A tibble: 2 × 7
#>   model     engine mode  predictor_indicators compute_intercept remove_intercept
#>   <chr>     <chr>  <chr> <chr>                <lgl>             <lgl>           
#> 1 bag_tree  rpart  cens… traditional          FALSE             FALSE           
#> 2 proporti… glmnet cens… traditional          TRUE              TRUE            
#> # ℹ 1 more variable: allow_sparse_x <lgl>

model_encodings %>% 
  filter(allow_sparse_x)
#> # A tibble: 1 × 7
#>   model     engine mode  predictor_indicators compute_intercept remove_intercept
#>   <chr>     <chr>  <chr> <chr>                <lgl>             <lgl>           
#> 1 proporti… glmnet cens… traditional          TRUE              TRUE            
#> # ℹ 1 more variable: allow_sparse_x <lgl>

Created on 2024-01-10 with reprex v2.0.2

hfrick avatar Jan 10 '24 14:01 hfrick

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

github-actions[bot] avatar Jan 25 '24 00:01 github-actions[bot]