gtsummary icon indicating copy to clipboard operation
gtsummary copied to clipboard

Feature Request: Improve output for glmnet models

Open themichjam opened this issue 2 years ago • 14 comments

Do not use this form to ask a question, or ask for assistance. Instead, ask on https://stackoverflow.com/ using the gtsummary tag. Questions about a function's use will be closed without a response.

If you have found a bug, please briefly describe your problem and what output you expect.

INCLUDE a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex. It'll take minutes to master creating a reprex. ISSUES WITHOUT A REPRODUCIBLE EXAMPLE WILL LIKELY BE CLOSED WITHOUT A RESPONSE.


Hi @ddsjoberg, when passing a tidymodels object to tbl_regression I get the below error and output. Julia from the `tidymodels team is confident it's not an error on their end. Would you know how to sort this? Thanks!

``` r
library(tidymodels)
data(penguins)
my_split <- penguins %>%
  na.omit() %>%
  initial_split()

rec <- recipe(sex ~ species + bill_length_mm + bill_depth_mm,
  data = penguins
) %>%
  step_dummy(species)

glmnet_spec <- logistic_reg(penalty = 0.1, mixture = 1) %>%
  set_engine("glmnet")

glmnet_fit <-
  workflow(rec, glmnet_spec) %>%
  last_fit(my_split)


extract_workflow(glmnet_fit) %>%
  gtsummary::tbl_regression(exponentiate = TRUE) %>%
  gtsummary::as_kable()
#> Extracting {workflows} model fit with `workflows::extract_fit_parsnip(x) %>% tbl_regression(...)`
#> Extracting {parsnip} model fit with `tbl_regression(x = x$fit, ...)`
#> x Unable to identify the list of variables.
#> 
#> This is usually due to an error calling `stats::model.frame(x)`or `stats::model.matrix(x)`.
#> It could be the case if that type of model does not implement these methods.
#> Rarely, this error may occur if the model object was created within
#> a functional programming framework (e.g. using `lappy()`, `purrr::map()`, etc.).
Characteristic exp(Beta)
(Intercept)
(Intercept) -0.09
(Intercept) -0.69
(Intercept) -1.77
(Intercept) -2.97
(Intercept) -4.09
(Intercept) -5.15
(Intercept) -6.16
(Intercept) -7.11
(Intercept) -8.02
(Intercept) -9.37
(Intercept) -11.1
(Intercept) -12.8
(Intercept) -14.4
(Intercept) -16.0
(Intercept) -17.7
(Intercept) -19.2
(Intercept) -20.8
(Intercept) -22.4
(Intercept) -23.9
(Intercept) -25.4
(Intercept) -27.0
(Intercept) -28.5
(Intercept) -29.9
(Intercept) -31.4
(Intercept) -32.8
(Intercept) -34.2
(Intercept) -35.6
(Intercept) -37.0
(Intercept) -38.3
(Intercept) -39.6
(Intercept) -40.9
(Intercept) -42.1
(Intercept) -43.4
(Intercept) -44.6
(Intercept) -45.7
(Intercept) -46.8
(Intercept) -47.9
(Intercept) -48.9
(Intercept) -49.9
(Intercept) -50.8
(Intercept) -51.7
(Intercept) -52.6
(Intercept) -53.4
(Intercept) -54.1
(Intercept) -54.8
(Intercept) -55.5
(Intercept) -56.1
(Intercept) -56.6
(Intercept) -57.2
(Intercept) -57.7
(Intercept) -58.2
(Intercept) -58.6
(Intercept) -59.0
(Intercept) -59.4
(Intercept) -59.7
(Intercept) -60.0
(Intercept) -60.3
(Intercept) -60.6
(Intercept) -60.8
(Intercept) -61.0
(Intercept) -61.3
(Intercept) -61.4
(Intercept) -61.6
(Intercept) -61.8
(Intercept) -61.9
(Intercept) -62.1
(Intercept) -62.2
(Intercept) -62.3
(Intercept) -62.4
(Intercept) -62.5
bill_length_mm
bill_length_mm 0.01
bill_length_mm 0.02
bill_length_mm 0.04
bill_length_mm 0.05
bill_length_mm 0.06
bill_length_mm 0.07
bill_length_mm 0.08
bill_length_mm 0.09
bill_length_mm 0.11
bill_length_mm 0.13
bill_length_mm 0.15
bill_length_mm 0.17
bill_length_mm 0.19
bill_length_mm 0.21
bill_length_mm 0.23
bill_length_mm 0.25
bill_length_mm 0.26
bill_length_mm 0.28
bill_length_mm 0.30
bill_length_mm 0.32
bill_length_mm 0.33
bill_length_mm 0.35
bill_length_mm 0.37
bill_length_mm 0.38
bill_length_mm 0.40
bill_length_mm 0.42
bill_length_mm 0.43
bill_length_mm 0.45
bill_length_mm 0.46
bill_length_mm 0.47
bill_length_mm 0.48
bill_length_mm 0.49
bill_length_mm 0.50
bill_length_mm 0.51
bill_length_mm 0.51
bill_length_mm 0.52
bill_length_mm 0.53
bill_length_mm 0.54
bill_length_mm 0.54
bill_length_mm 0.55
bill_length_mm 0.55
bill_length_mm 0.56
bill_length_mm 0.56
bill_length_mm 0.57
bill_length_mm 0.57
bill_length_mm 0.58
bill_length_mm 0.58
bill_length_mm 0.59
bill_length_mm 0.59
bill_length_mm 0.59
bill_length_mm 0.60
bill_length_mm 0.60
bill_length_mm 0.60
bill_length_mm 0.60
bill_length_mm 0.61
bill_length_mm 0.61
bill_length_mm 0.61
bill_length_mm 0.61
bill_length_mm 0.61
bill_length_mm 0.61
bill_length_mm 0.62
bill_length_mm 0.62
bill_length_mm 0.62
bill_length_mm 0.62
bill_length_mm 0.62
bill_length_mm 0.62
bill_length_mm 0.62
bill_length_mm 0.62
bill_depth_mm
bill_depth_mm 0.04
bill_depth_mm 0.07
bill_depth_mm 0.11
bill_depth_mm 0.14
bill_depth_mm 0.18
bill_depth_mm 0.21
bill_depth_mm 0.23
bill_depth_mm 0.26
bill_depth_mm 0.30
bill_depth_mm 0.35
bill_depth_mm 0.40
bill_depth_mm 0.45
bill_depth_mm 0.50
bill_depth_mm 0.55
bill_depth_mm 0.60
bill_depth_mm 0.65
bill_depth_mm 0.69
bill_depth_mm 0.74
bill_depth_mm 0.79
bill_depth_mm 0.83
bill_depth_mm 0.88
bill_depth_mm 0.92
bill_depth_mm 0.96
bill_depth_mm 1.01
bill_depth_mm 1.05
bill_depth_mm 1.09
bill_depth_mm 1.13
bill_depth_mm 1.17
bill_depth_mm 1.21
bill_depth_mm 1.25
bill_depth_mm 1.30
bill_depth_mm 1.35
bill_depth_mm 1.39
bill_depth_mm 1.44
bill_depth_mm 1.48
bill_depth_mm 1.52
bill_depth_mm 1.56
bill_depth_mm 1.60
bill_depth_mm 1.64
bill_depth_mm 1.67
bill_depth_mm 1.71
bill_depth_mm 1.74
bill_depth_mm 1.77
bill_depth_mm 1.79
bill_depth_mm 1.82
bill_depth_mm 1.84
bill_depth_mm 1.86
bill_depth_mm 1.88
bill_depth_mm 1.90
bill_depth_mm 1.92
bill_depth_mm 1.94
bill_depth_mm 1.95
bill_depth_mm 1.97
bill_depth_mm 1.98
bill_depth_mm 1.99
bill_depth_mm 2.01
bill_depth_mm 2.02
bill_depth_mm 2.03
bill_depth_mm 2.03
bill_depth_mm 2.04
bill_depth_mm 2.05
bill_depth_mm 2.06
bill_depth_mm 2.06
bill_depth_mm 2.07
bill_depth_mm 2.07
bill_depth_mm 2.08
bill_depth_mm 2.08
bill_depth_mm 2.09
bill_depth_mm 2.09
species_Chinstrap
species_Chinstrap -0.15
species_Chinstrap -0.42
species_Chinstrap -0.68
species_Chinstrap -0.94
species_Chinstrap -1.19
species_Chinstrap -1.43
species_Chinstrap -1.67
species_Chinstrap -1.90
species_Chinstrap -2.13
species_Chinstrap -2.36
species_Chinstrap -2.59
species_Chinstrap -2.81
species_Chinstrap -3.03
species_Chinstrap -3.25
species_Chinstrap -3.46
species_Chinstrap -3.67
species_Chinstrap -3.88
species_Chinstrap -4.08
species_Chinstrap -4.28
species_Chinstrap -4.48
species_Chinstrap -4.67
species_Chinstrap -4.85
species_Chinstrap -5.00
species_Chinstrap -5.12
species_Chinstrap -5.24
species_Chinstrap -5.36
species_Chinstrap -5.47
species_Chinstrap -5.58
species_Chinstrap -5.68
species_Chinstrap -5.78
species_Chinstrap -5.87
species_Chinstrap -5.96
species_Chinstrap -6.04
species_Chinstrap -6.13
species_Chinstrap -6.20
species_Chinstrap -6.27
species_Chinstrap -6.34
species_Chinstrap -6.40
species_Chinstrap -6.46
species_Chinstrap -6.51
species_Chinstrap -6.56
species_Chinstrap -6.61
species_Chinstrap -6.66
species_Chinstrap -6.70
species_Chinstrap -6.73
species_Chinstrap -6.77
species_Chinstrap -6.80
species_Chinstrap -6.83
species_Chinstrap -6.86
species_Chinstrap -6.88
species_Chinstrap -6.91
species_Chinstrap -6.93
species_Chinstrap -6.95
species_Chinstrap -6.97
species_Chinstrap -6.98
species_Chinstrap -7.00
species_Chinstrap -7.01
species_Chinstrap -7.02
species_Chinstrap -7.03
species_Chinstrap -7.04
species_Chinstrap -7.05
species_Gentoo
species_Gentoo 0.06
species_Gentoo 0.16
species_Gentoo 0.25
species_Gentoo 0.34
species_Gentoo 0.42
species_Gentoo 0.50
species_Gentoo 0.58
species_Gentoo 0.65
species_Gentoo 0.72
species_Gentoo 0.79
species_Gentoo 0.85
species_Gentoo 0.91
species_Gentoo 0.96
species_Gentoo 1.01
species_Gentoo 1.06
species_Gentoo 1.10
species_Gentoo 1.14
species_Gentoo 1.18
species_Gentoo 1.22
species_Gentoo 1.25
species_Gentoo 1.28
species_Gentoo 1.31
species_Gentoo 1.34
species_Gentoo 1.36
species_Gentoo 1.38
species_Gentoo 1.40
species_Gentoo 1.42
species_Gentoo 1.44
species_Gentoo 1.46
species_Gentoo 1.47
species_Gentoo 1.49
species_Gentoo 1.50
species_Gentoo 1.51
species_Gentoo 1.52
species_Gentoo 1.53
species_Gentoo 1.54
species_Gentoo 1.55
species_Gentoo 1.55
species_Gentoo 1.56

Created on 2022-06-18 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       Windows 10 x64 (build 22000)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.1252
#>  ctype    English_United Kingdom.1252
#>  tz       Europe/London
#>  date     2022-06-18
#>  pandoc   2.18 @ C:/PROGRA~3/chocolatey/bin/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package       * version    date (UTC) lib source
#>  assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.1.2)
#>  backports       1.4.1      2021-12-13 [1] CRAN (R 4.1.2)
#>  broom         * 0.8.0      2022-04-13 [1] CRAN (R 4.1.3)
#>  broom.helpers   1.6.0      2022-01-12 [1] CRAN (R 4.1.2)
#>  class           7.3-20     2022-01-13 [1] CRAN (R 4.1.2)
#>  cli             3.2.0      2022-02-14 [1] CRAN (R 4.1.2)
#>  codetools       0.2-18     2020-11-04 [2] CRAN (R 4.1.3)
#>  colorspace      2.0-3      2022-02-21 [1] CRAN (R 4.1.3)
#>  crayon          1.5.1      2022-03-26 [1] CRAN (R 4.1.3)
#>  DBI             1.1.2      2021-12-20 [1] CRAN (R 4.1.2)
#>  dials         * 0.1.0      2022-01-31 [1] CRAN (R 4.1.2)
#>  DiceDesign      1.9        2021-02-13 [1] CRAN (R 4.1.2)
#>  digest          0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  dplyr         * 1.0.8      2022-02-08 [1] CRAN (R 4.1.2)
#>  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate        0.15       2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi           1.0.3      2022-03-24 [1] CRAN (R 4.1.3)
#>  fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
#>  foreach         1.5.2      2022-02-02 [1] CRAN (R 4.1.2)
#>  fs              1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
#>  furrr           0.2.3      2021-06-25 [1] CRAN (R 4.1.2)
#>  future          1.24.0     2022-02-19 [1] CRAN (R 4.1.3)
#>  future.apply    1.8.1      2021-08-10 [1] CRAN (R 4.1.2)
#>  generics        0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
#>  ggplot2       * 3.3.5      2021-06-25 [1] CRAN (R 4.1.2)
#>  glmnet        * 4.1-4      2022-04-15 [1] CRAN (R 4.1.3)
#>  globals         0.14.0     2020-11-22 [1] CRAN (R 4.1.1)
#>  glue            1.6.2      2022-02-24 [1] CRAN (R 4.1.3)
#>  gower           1.0.0      2022-02-03 [1] CRAN (R 4.1.2)
#>  GPfit           1.0-8      2019-02-08 [1] CRAN (R 4.1.2)
#>  gt              0.6.0      2022-05-24 [1] CRAN (R 4.1.3)
#>  gtable          0.3.0      2019-03-25 [1] CRAN (R 4.1.2)
#>  gtsummary       1.5.2      2022-01-29 [1] CRAN (R 4.1.2)
#>  hardhat         0.2.0      2022-01-24 [1] CRAN (R 4.1.2)
#>  highr           0.9        2021-04-16 [1] CRAN (R 4.1.2)
#>  htmltools       0.5.2      2021-08-25 [1] CRAN (R 4.1.2)
#>  infer         * 1.0.0      2021-08-13 [1] CRAN (R 4.1.2)
#>  ipred           0.9-12     2021-09-15 [1] CRAN (R 4.1.2)
#>  iterators       1.0.14     2022-02-05 [1] CRAN (R 4.1.2)
#>  knitr           1.39       2022-04-26 [1] CRAN (R 4.1.3)
#>  lattice         0.20-45    2021-09-22 [2] CRAN (R 4.1.3)
#>  lava            1.6.10     2021-09-02 [1] CRAN (R 4.1.2)
#>  lhs             1.1.5      2022-03-22 [1] CRAN (R 4.1.3)
#>  lifecycle       1.0.1      2021-09-24 [1] CRAN (R 4.1.2)
#>  listenv         0.8.0      2019-12-05 [1] CRAN (R 4.1.2)
#>  lubridate       1.8.0      2021-10-07 [1] CRAN (R 4.1.2)
#>  magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.1.3)
#>  MASS            7.3-55     2022-01-13 [1] CRAN (R 4.1.2)
#>  Matrix        * 1.4-0      2021-12-08 [2] CRAN (R 4.1.3)
#>  modeldata     * 0.1.1      2021-07-14 [1] CRAN (R 4.1.2)
#>  munsell         0.5.0      2018-06-12 [1] CRAN (R 4.1.2)
#>  nnet            7.3-17     2022-01-13 [1] CRAN (R 4.1.2)
#>  parallelly      1.30.0     2021-12-17 [1] CRAN (R 4.1.2)
#>  parsnip       * 0.2.1.9000 2022-03-29 [1] Github (tidymodels/parsnip@9ce41c8)
#>  pillar          1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
#>  plyr            1.8.7      2022-03-24 [1] CRAN (R 4.1.3)
#>  pROC            1.18.0     2021-09-03 [1] CRAN (R 4.1.2)
#>  prodlim         2019.11.13 2019-11-17 [1] CRAN (R 4.1.2)
#>  purrr         * 0.3.4      2020-04-17 [1] CRAN (R 4.1.2)
#>  R.cache         0.15.0     2021-04-30 [1] CRAN (R 4.1.2)
#>  R.methodsS3     1.8.1      2020-08-26 [1] CRAN (R 4.1.1)
#>  R.oo            1.24.0     2020-08-26 [1] CRAN (R 4.1.1)
#>  R.utils         2.11.0     2021-09-26 [1] CRAN (R 4.1.2)
#>  R6              2.5.1      2021-08-19 [1] CRAN (R 4.1.2)
#>  Rcpp            1.0.8.3    2022-03-17 [1] CRAN (R 4.1.3)
#>  recipes       * 0.2.0      2022-02-18 [1] CRAN (R 4.1.2)
#>  reprex          2.0.1      2021-08-05 [1] CRAN (R 4.1.2)
#>  rlang           1.0.2      2022-03-04 [1] CRAN (R 4.1.3)
#>  rmarkdown       2.14       2022-04-25 [1] CRAN (R 4.1.3)
#>  rpart           4.1.16     2022-01-24 [1] CRAN (R 4.1.2)
#>  rsample       * 0.1.1      2021-11-08 [1] CRAN (R 4.1.2)
#>  rstudioapi      0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  scales        * 1.2.0      2022-04-13 [1] CRAN (R 4.1.3)
#>  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  shape           1.4.6      2021-05-19 [1] CRAN (R 4.1.1)
#>  stringi         1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr         1.4.0      2019-02-10 [1] CRAN (R 4.1.2)
#>  styler          1.7.0      2022-03-13 [1] CRAN (R 4.1.3)
#>  survival        3.2-13     2021-08-24 [2] CRAN (R 4.1.3)
#>  tibble        * 3.1.6      2021-11-07 [1] CRAN (R 4.1.2)
#>  tidymodels    * 0.2.0      2022-03-19 [1] CRAN (R 4.1.3)
#>  tidyr         * 1.2.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  tidyselect      1.1.2      2022-02-21 [1] CRAN (R 4.1.3)
#>  timeDate        3043.102   2018-02-21 [1] CRAN (R 4.1.1)
#>  tune          * 0.2.0.9000 2022-03-29 [1] Github (tidymodels/tune@6ed30a4)
#>  utf8            1.2.2      2021-07-24 [1] CRAN (R 4.1.2)
#>  vctrs           0.4.1      2022-04-13 [1] CRAN (R 4.1.3)
#>  withr           2.5.0      2022-03-03 [1] CRAN (R 4.1.3)
#>  workflows     * 0.2.6      2022-03-18 [1] CRAN (R 4.1.3)
#>  workflowsets  * 0.2.1      2022-03-15 [1] CRAN (R 4.1.3)
#>  xfun            0.30       2022-03-02 [1] CRAN (R 4.1.3)
#>  yaml            2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#>  yardstick     * 0.0.9      2021-11-22 [1] CRAN (R 4.1.2)
#> 
#>  [1] C:/Users/rmkja/Documents/R/win-library/4.1
#>  [2] C:/Program Files/R/R-4.1.3/library
#> 
#> ------------------------------------------------------------------------------

themichjam avatar Jun 18 '22 08:06 themichjam

@themichjam what is it you'd like to see from this model summary?

library(tidymodels)
data(penguins)
my_split <- penguins %>%
  na.omit() %>%
  initial_split()

rec <- recipe(sex ~ species + bill_length_mm + bill_depth_mm,
              data = penguins) %>%
  step_dummy(species)

glmnet_spec <- 
  logistic_reg(penalty = 0.1, mixture = 1) %>%
  set_engine("glmnet")

glmnet_fit <-
  workflow(rec, glmnet_spec) %>%
  last_fit(my_split)

mod <- 
  extract_workflow(glmnet_fit) %>%
  workflows::extract_fit_parsnip() %>% 
  purrr::pluck("fit") 
class(mod)
#> [1] "lognet" "glmnet"

# THIS IS WHAT TBL_REGRESSION IS PRINTING
df_broom <- mod %>% 
  broom::tidy() 
df_broom
#> # A tibble: 314 × 5
#>    term         step estimate lambda dev.ratio
#>    <chr>       <dbl>    <dbl>  <dbl>     <dbl>
#>  1 (Intercept)     1  -0.0241 0.211  -3.04e-15
#>  2 (Intercept)     2  -0.686  0.192   2.18e- 2
#>  3 (Intercept)     3  -1.73   0.175   5.43e- 2
#>  4 (Intercept)     4  -3.09   0.160   9.36e- 2
#>  5 (Intercept)     5  -4.36   0.145   1.27e- 1
#>  6 (Intercept)     6  -5.57   0.133   1.56e- 1
#>  7 (Intercept)     7  -6.71   0.121   1.81e- 1
#>  8 (Intercept)     8  -7.81   0.110   2.03e- 1
#>  9 (Intercept)     9  -8.86   0.100   2.22e- 1
#> 10 (Intercept)    10  -9.87   0.0914  2.39e- 1
#> # … with 304 more rows

# IS THIS WHAT YOU WANT TO PRINT?
df_broom %>%
  dplyr::filter(step == max(step))
#> # A tibble: 5 × 5
#>   term               step estimate   lambda dev.ratio
#>   <chr>             <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)          71  -64.2   0.000313     0.597
#> 2 bill_length_mm       71    0.621 0.000313     0.597
#> 3 bill_depth_mm        71    2.18  0.000313     0.597
#> 4 species_Chinstrap    71   -6.45  0.000313     0.597
#> 5 species_Gentoo       71    1.82  0.000313     0.597

Created on 2022-06-18 by the reprex package (v2.0.1)

ddsjoberg avatar Jun 18 '22 13:06 ddsjoberg

@ddsjoberg I was wondering if it was possible to get the same output with the tidymodels model as you would get passing a traditional glm through your package's tble_regression()?

themichjam avatar Jun 18 '22 13:06 themichjam

The model returns many coef estimates. Can you show me which you want to display?

ddsjoberg avatar Jun 18 '22 13:06 ddsjoberg

NOTE: We have a check for workflow objects to print an informative message about the default dummy variable creation that occurs in worflows/parsnip. When a workflows object creates dummy variables for the categorical variables, they do it in a way such that we cannot identify the larger variable name, place the variable header, etc.: each dummy variable is treated as a separate variable in the summary.

tbl_regression.workflow <- function(x, ...) {
  assert_package("workflows", "tbl_regression.workflow()")

  if (isTRUE(!x$pre$actions$formula$blueprint$indicators %in% "none")) {
    paste("To take full advantage of model formatting, e.g. grouping categorical",
          "variables, please add the following argument to the `workflows::add_model()` call:") %>%
      stringr::str_wrap() %>%
      paste("`blueprint = hardhat::default_formula_blueprint(indicators = 'none')`", sep = "\n") %>%
      paste("\n") %>%
      rlang::inform()
  }

BUT it seems that the internal structure may have changed, and this was not being triggered in the examples above.

Also, it seems that we just need a tbl_regression.glmnet() method to better handle the results of these models.

ddsjoberg avatar Jun 18 '22 23:06 ddsjoberg

You might want to check out the tidy method we use in parsnip for glmnet.

juliasilge avatar Jun 19 '22 01:06 juliasilge

Fantastic, thank you @juliasilge !

ddsjoberg avatar Jun 19 '22 01:06 ddsjoberg

Dear @ddsjoberg Do you think it has implications for broom.helpers ?

larmarange avatar Jun 19 '22 09:06 larmarange

@larmarange It depends on how we seek to resolve the issue, but I think the best solutions would likely involve broom.helpers. I am no expert on workflows/parsnip, but here what I understand.

  • the returned model objects do not have the typical terms object; tidymodels has alternative versions of terms, model_frame(), etc.
  • The current implementation we have in gtsummary simply extracts the underlying model fit, and re-passes the object (and the original arguments) to tbl_regression().
  • A more robust solution would be to utilize tidiers built for these objects. However, workflows can be many many many different types of models. Will the unified structure of the workflows/parsnip object allow us a single tidier for all model types, or are we then going to need to have special handling of various types of models built with tidymodels?

ddsjoberg avatar Jun 19 '22 11:06 ddsjoberg

@themichjam the issue you've raised has two parts, 1. Improving the tbl_regression() output for glmnet models, and 2. better handling for workflows/parsnip models.

We can track the first in this issue, and the second will be handled here https://github.com/larmarange/broom.helpers/issues/160 .

@themichjam to address the first part, it would be helpful to have your input on what exactly you want to report from tbl_regression(). The glmnet tidier returns many many many versions of the coefs that are returned, and the tidier is used in the background to construct the tbl_regression() tables.

cc @larmarange

ddsjoberg avatar Jun 20 '22 00:06 ddsjoberg

Dear @ddsjoberg look at mod2 in the example below

library(tidymodels)
data(penguins)
my_split <- penguins %>%
  na.omit() %>%
  initial_split()

rec <- recipe(sex ~ species + bill_length_mm + bill_depth_mm,
              data = penguins) %>%
  step_dummy(species)

glmnet_spec <- 
  logistic_reg(penalty = 0.1, mixture = 1) %>%
  set_engine("glmnet")

glmnet_fit <-
  workflow(rec, glmnet_spec) %>%
  last_fit(my_split)

mod <- 
  extract_workflow(glmnet_fit) %>%
  workflows::extract_fit_parsnip() %>% 
  purrr::pluck("fit") 
class(mod)
#> [1] "lognet" "glmnet"

mod %>% tidy()
#> # A tibble: 304 x 5
#>    term         step estimate lambda dev.ratio
#>    <chr>       <dbl>    <dbl>  <dbl>     <dbl>
#>  1 (Intercept)     1  -0.0402 0.190   3.69e-15
#>  2 (Intercept)     2  -0.631  0.173   1.78e- 2
#>  3 (Intercept)     3  -1.78   0.158   4.95e- 2
#>  4 (Intercept)     4  -3.00   0.144   8.06e- 2
#>  5 (Intercept)     5  -4.15   0.131   1.07e- 1
#>  6 (Intercept)     6  -5.23   0.119   1.30e- 1
#>  7 (Intercept)     7  -6.25   0.109   1.50e- 1
#>  8 (Intercept)     8  -7.22   0.0992  1.67e- 1
#>  9 (Intercept)     9  -8.41   0.0904  1.90e- 1
#> 10 (Intercept)    10 -10.3    0.0824  2.33e- 1
#> # ... with 294 more rows

mod2 <- 
  extract_workflow(glmnet_fit) %>%
  workflows::extract_fit_parsnip()
class(mod2)
#> [1] "_lognet"   "model_fit"

mod2 %>% tidy()
#> # A tibble: 5 x 3
#>   term              estimate penalty
#>   <chr>                <dbl>   <dbl>
#> 1 (Intercept)        -7.14       0.1
#> 2 bill_length_mm      0.0708     0.1
#> 3 bill_depth_mm       0.232      0.1
#> 4 species_Chinstrap   0          0.1
#> 5 species_Gentoo      0          0.1

Created on 2022-06-20 by the reprex package (v2.0.1)

larmarange avatar Jun 20 '22 09:06 larmarange

However, this tidier works only on a model_fit object and not on the underlying glmnet

larmarange avatar Jun 20 '22 09:06 larmarange

An option would be to support natively model_fit objects in broom.helpers and to remove tbl_regression.model_fit() from gtsummary?

larmarange avatar Jun 20 '22 09:06 larmarange

Very first try of native support of model_fit in broom.helpers (see https://github.com/larmarange/broom.helpers/pull/161 )

With this, you can apply tbl_regression.default() directly on a model_fit object. It will use dedicated tidy methods for model_fit while broom.helpers will consider, where relevant, the model$fit object.

library(tidymodels)
library(gtsummary)
#> 
#> Attachement du package : 'gtsummary'
#> L'objet suivant est masqué depuis 'package:recipes':
#> 
#>     all_numeric
library(broom.helpers)
#> 
#> Attachement du package : 'broom.helpers'
#> Les objets suivants sont masqués depuis 'package:gtsummary':
#> 
#>     all_continuous, all_contrasts

data(penguins)
my_split <- penguins %>%
  na.omit() %>%
  initial_split()

rec <- recipe(sex ~ species + bill_length_mm + bill_depth_mm,
              data = penguins
) %>%
  step_dummy(species)

glmnet_spec <- logistic_reg(penalty = 0.1, mixture = 1) %>%
  set_engine("glmnet")

glmnet_fit <-
  workflow(rec, glmnet_spec) %>%
  last_fit(my_split)

f <- extract_fit_parsnip(glmnet_fit)
tidy(f)
#> # A tibble: 5 x 3
#>   term              estimate penalty
#>   <chr>                <dbl>   <dbl>
#> 1 (Intercept)        -8.50       0.1
#> 2 bill_length_mm      0.0912     0.1
#> 3 bill_depth_mm       0.258      0.1
#> 4 species_Chinstrap   0          0.1
#> 5 species_Gentoo      0          0.1
tidy_plus_plus(f)
#> x Unable to identify the list of variables.
#> 
#> This is usually due to an error calling `stats::model.frame(x)`or `stats::model.matrix(x)`.
#> It could be the case if that type of model does not implement these methods.
#> Rarely, this error may occur if the model object was created within
#> a functional programming framework (e.g. using `lappy()`, `purrr::map()`, etc.).
#> # A tibble: 5 x 13
#>   term              variable  var_label var_class var_type var_nlevels contrasts
#>   <chr>             <chr>     <chr>         <int> <chr>          <int> <chr>    
#> 1 (Intercept)       (Interce~ (Interce~        NA unknown           NA <NA>     
#> 2 bill_length_mm    bill_len~ bill_len~        NA unknown           NA <NA>     
#> 3 bill_depth_mm     bill_dep~ bill_dep~        NA unknown           NA <NA>     
#> 4 species_Chinstrap species_~ species_~        NA unknown           NA <NA>     
#> 5 species_Gentoo    species_~ species_~        NA unknown           NA <NA>     
#> # ... with 6 more variables: contrasts_type <chr>, reference_row <lgl>,
#> #   label <chr>, estimate <dbl>, penalty <dbl>, n <dbl>
gtsummary:::tbl_regression.default(f) %>%
  as_kable()
#> x Unable to identify the list of variables.
#> 
#> This is usually due to an error calling `stats::model.frame(x)`or `stats::model.matrix(x)`.
#> It could be the case if that type of model does not implement these methods.
#> Rarely, this error may occur if the model object was created within
#> a functional programming framework (e.g. using `lappy()`, `purrr::map()`, etc.).
Characteristic Beta
(Intercept) -8.5
bill_length_mm 0.09
bill_depth_mm 0.26
species_Chinstrap 0.00
species_Gentoo 0.00

Created on 2022-06-20 by the reprex package (v2.0.1)

larmarange avatar Jun 20 '22 09:06 larmarange

@larmarange looks great! and you brought up a great point about the species being treated as individual variables because glmnet requires a matrix with summary variable coding and does not allow for the factor/character variables to be passed

ddsjoberg avatar Jun 21 '22 17:06 ddsjoberg

Progress can be followed here: https://github.com/larmarange/broom.helpers/issues/162

ddsjoberg avatar Aug 29 '22 12:08 ddsjoberg