performance icon indicating copy to clipboard operation
performance copied to clipboard

error checking model from parsnip object: operator is invalid

Open verajosemanuel opened this issue 3 years ago β€’ 7 comments

Tried to check_model using a very simple glmnet classification task.

Code from here: https://stackoverflow.com/questions/65969913/extract-plain-model-from-tidymodel-object

library(magrittr)
library(tidymodels)
library(performance)

data(two_class_dat)

glm_spec <- logistic_reg() %>%
  set_engine("glmnet")

norm_rec <- recipe(Class ~ A + B, data = two_class_dat) %>%
  step_normalize(all_predictors())

glm_fit <- workflow() %>%
  add_recipe(norm_rec) %>%
  add_model(glm_spec) %>%
  fit(two_class_dat) %>%
  pull_workflow_fit()



performance::check_model(glm_fit)

Error: $ operator is invalid for atomic vectors

verajosemanuel avatar May 19 '21 11:05 verajosemanuel

Not sure this is a parsnip issue.... How would this code look in "regular form"? Something like glmnet(Class ~ A + B, data = two_class_dat)?

strengejacke avatar May 19 '21 14:05 strengejacke

library(performance)
data(two_class_dat, package = "modeldata")

fit <- glmnet::glmnet(two_class_dat[, 1:2], two_class_dat[, 3], family = "binomial")

check_model(fit)
#> Error: $ operator is invalid for atomic vectors

Created on 2021-05-19 by the reprex package (v2.0.0)

The error happens in insight::model_info when it is trying to subset the results of stats::family(fit)

EmilHvitfeldt avatar May 19 '21 16:05 EmilHvitfeldt

I just wanted to revisit this issue, but there seems to be a new issue, possibly in glmnet:

data(two_class_dat, package = "modeldata")
fit <- glmnet::glmnet(two_class_dat[, 1:2], two_class_dat[, 3], family = "binomial")
#> Warning in Ops.factor(left, right): '*' not meaningful for factors
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'drop': requires numeric/complex matrix/vector arguments

Created on 2021-06-16 by the reprex package (v2.0.0)

strengejacke avatar Jun 16 '21 08:06 strengejacke

Strange.

If I run the code interactively, it works:

> data(two_class_dat, package = "modeldata")
> fit <- glmnet::glmnet(two_class_dat[, 1:2], two_class_dat[, 3], family = "binomial")
> fit

Call:  glmnet::glmnet(x = two_class_dat[, 1:2], y = two_class_dat[,      3], family = "binomial") 

   Df  %Dev   Lambda
1   0  0.00 0.308100
2   1  4.74 0.280700
3   1  8.72 0.255800
4   1 12.08 0.233100
5   1 14.96 0.212400
6   1 17.44 0.193500
7   1 19.57 0.176300
8   1 21.42 0.160600
9   1 23.02 0.146400
10  1 24.41 0.133400
11  1 25.62 0.121500
12  1 26.67 0.110700
13  1 27.58 0.100900
14  1 28.37 0.091930
15  1 29.05 0.083760
16  1 29.64 0.076320
17  1 30.15 0.069540
18  1 30.59 0.063360
19  1 30.97 0.057730
20  1 31.30 0.052610
21  1 31.58 0.047930
22  1 31.82 0.043670
23  1 32.02 0.039790
24  2 32.48 0.036260
25  2 33.27 0.033040
26  2 33.95 0.030100
27  2 34.54 0.027430
28  2 35.05 0.024990
29  2 35.48 0.022770
30  2 35.86 0.020750
31  2 36.18 0.018910
32  2 36.46 0.017230
33  2 36.70 0.015700
34  2 36.90 0.014300
35  2 37.08 0.013030
36  2 37.23 0.011870
37  2 37.35 0.010820
38  2 37.46 0.009857
39  2 37.55 0.008982
40  2 37.63 0.008184
41  2 37.70 0.007457
42  2 37.75 0.006794
43  2 37.80 0.006191
44  2 37.84 0.005641
45  2 37.87 0.005140
46  2 37.90 0.004683
47  2 37.92 0.004267
48  2 37.94 0.003888
49  2 37.96 0.003543
50  2 37.98 0.003228
51  2 37.99 0.002941
52  2 38.00 0.002680
53  2 38.01 0.002442
54  2 38.01 0.002225
55  2 38.02 0.002027
56  2 38.02 0.001847
57  2 38.03 0.001683
58  2 38.03 0.001533
59  2 38.03 0.001397
60  2 38.03 0.001273
61  2 38.04 0.001160
62  2 38.04 0.001057
63  2 38.04 0.000963
64  2 38.04 0.000878
65  2 38.04 0.000800

But, if I try to create a reprex, it doesn't πŸ€”

data(two_class_dat, package = "modeldata")
fit <- glmnet::glmnet(two_class_dat[, 1:2], two_class_dat[, 3], family = "binomial")
#> Warning in Ops.factor(left, right): '*' not meaningful for factors
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'drop': requires numeric/complex matrix/vector arguments

Created on 2021-06-16 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2021-06-16                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                       
#>  backports     1.2.1      2020-12-09 [1] CRAN (R 4.1.0)               
#>  cli           2.5.0.9000 2021-06-11 [1] Github (r-lib/cli@571fea6)   
#>  codetools     0.2-18     2020-11-04 [2] CRAN (R 4.1.0)               
#>  crayon        1.4.1      2021-02-08 [1] CRAN (R 4.1.0)               
#>  digest        0.6.27     2020-10-24 [1] CRAN (R 4.1.0)               
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)               
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.1.0)               
#>  fansi         0.5.0      2021-05-25 [1] CRAN (R 4.1.0)               
#>  foreach       1.5.1      2020-10-15 [1] CRAN (R 4.1.0)               
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.1.0)               
#>  glmnet        4.1-1      2021-02-21 [1] CRAN (R 4.1.0)               
#>  glue          1.4.2      2020-08-27 [1] CRAN (R 4.1.0)               
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.0)               
#>  htmltools     0.5.1.1    2021-01-22 [1] CRAN (R 4.1.0)               
#>  iterators     1.0.13     2020-10-15 [1] CRAN (R 4.1.0)               
#>  knitr         1.33       2021-04-24 [1] CRAN (R 4.1.0)               
#>  lattice       0.20-44    2021-05-02 [2] CRAN (R 4.1.0)               
#>  lifecycle     1.0.0      2021-02-15 [1] CRAN (R 4.1.0)               
#>  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.1.0)               
#>  Matrix        1.3-3      2021-05-04 [2] CRAN (R 4.1.0)               
#>  pillar        1.6.1      2021-05-16 [1] CRAN (R 4.1.0)               
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)               
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.1.0)               
#>  reprex        2.0.0      2021-04-02 [1] CRAN (R 4.1.0)               
#>  rlang         0.4.11     2021-04-30 [1] CRAN (R 4.1.0)               
#>  rmarkdown     2.9        2021-06-15 [1] CRAN (R 4.1.0)               
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.1.0)               
#>  shape         1.4.6      2021-05-19 [1] CRAN (R 4.1.0)               
#>  stringi       1.6.2      2021-05-17 [1] CRAN (R 4.1.0)               
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.1.0)               
#>  styler        1.4.1.9003 2021-06-09 [1] Github (r-lib/styler@a58a411)
#>  survival      3.2-11     2021-04-26 [2] CRAN (R 4.1.0)               
#>  tibble        3.1.2      2021-05-16 [1] CRAN (R 4.1.0)               
#>  utf8          1.2.1      2021-03-12 [1] CRAN (R 4.1.0)               
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.0)               
#>  withr         2.4.2      2021-04-18 [1] CRAN (R 4.1.0)               
#>  xfun          0.24       2021-06-15 [1] CRAN (R 4.1.0)               
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.1.0)               
#> 
#> [1] /Users/patil/Library/R/x86_64/4.1/library
#> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

IndrajeetPatil avatar Jun 16 '21 08:06 IndrajeetPatil

The reprex no longer works. Any updates on this issue to reproduce the error?

library(magrittr)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(performance)
#> 
#> Attaching package: 'performance'
#> The following objects are masked from 'package:yardstick':
#> 
#>     mae, rmse

data(two_class_dat)

glm_spec <- logistic_reg() %>%
  set_engine("glmnet")

norm_rec <- recipe(Class ~ A + B, data = two_class_dat) %>%
  step_normalize(all_predictors())

glm_fit <- workflow() %>%
  add_recipe(norm_rec) %>%
  add_model(glm_spec) %>%
  fit(two_class_dat) %>%
  pull_workflow_fit()
#> Warning: `pull_workflow_fit()` was deprecated in workflows 0.2.3.
#> Please use `extract_fit_parsnip()` instead.
#> Error in `.check_glmnet_penalty_fit()`:
#> ! For the glmnet engine, `penalty` must be a single number (or a value of `tune()`).
#> * There are 0 values for `penalty`.
#> * To try multiple values for total regularization, use the tune package.
#> * To predict multiple penalties, use `multi_predict()`

Created on 2022-03-02 by the reprex package (v2.0.1)

strengejacke avatar Mar 02 '22 10:03 strengejacke

Here is an updated reprex reflecting the changes in tidymodels πŸ˜ƒ

library(magrittr)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(performance)
#> 
#> Attaching package: 'performance'
#> The following objects are masked from 'package:yardstick':
#> 
#>     mae, rmse

data(two_class_dat)

glm_spec <- logistic_reg(penalty = 1) %>%
  set_engine("glmnet")

norm_rec <- recipe(Class ~ A + B, data = two_class_dat) %>%
  step_normalize(all_predictors())

glm_fit <- workflow() %>%
  add_recipe(norm_rec) %>%
  add_model(glm_spec) %>%
  fit(two_class_dat) %>%
  extract_fit_parsnip()

performance::check_model(glm_fit)
#> Error: $ operator is invalid for atomic vectors

Created on 2022-03-02 by the reprex package (v2.0.1)

EmilHvitfeldt avatar Mar 02 '22 17:03 EmilHvitfeldt

Hello, this issue comes from these lines in insight:::model_info.default() which are called by performance::check_model:

https://github.com/easystats/insight/blob/e104d8a95c59c7092b5712e29da0118b05ced215/R/model_info.R#L116-L124

This is because stats::family() apparently returns less info than with other objects:

> class(glm_fit$fit)
[1] "lognet" "glmnet"

> stats::family(glm_fit$fit)
    lognet 
"binomial" 

Compared to lme4::lmer objects for example:

library(lme4)
#> Le chargement a nΓ©cessitΓ© le package : Matrix
m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
stats::family(m)
#> 
#> Family: gaussian 
#> Link function: identity

Created on 2022-05-20 by the reprex package (v2.0.1)

However, I don't know what should be the arguments of .make_family() for this kind of objects so I can't fix this.

etiennebacher avatar May 20 '22 14:05 etiennebacher