insight `get_data()`: labels with `factor()` in model formula

`get_data()`: labels with `factor()` in model formula

Open vincentarelbundock opened this issue 1 year ago • 5 comments

Can get_data() preserve the label of a variable when it is wrapped in factor() in the formula?

Notice that the mpg variable retains its label, but not the cyl factor, since the latter is wrapped in factor() in the model formula.

library(haven)
library(insight)

dat <- mtcars
dat$mpg <- labelled(dat$mpg, label = "Miles per Gallon")
dat$cyl <- labelled(dat$cyl, label = "Cylinders")
mod <- lm(mpg ~ factor(cyl), dat)
get_data(mod) |> str()
#> 'data.frame':    32 obs. of  2 variables:
#>  $ mpg: dbl+lbl [1:32] 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 1...
#>    ..@ label: chr "Miles per Gallon"
#>  $ cyl: num  6 6 4 6 8 6 8 4 4 6 ...
#>   ..- attr(*, "factor")= logi TRUE
#>  - attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ factor(cyl)
#>   .. ..- attr(*, "variables")= language list(mpg, factor(cyl))
#>   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:2] "mpg" "factor(cyl)"
#>   .. .. .. ..$ : chr "factor(cyl)"
#>   .. ..- attr(*, "term.labels")= chr "factor(cyl)"
#>   .. ..- attr(*, "order")= int 1
#>   .. ..- attr(*, "intercept")= int 1
#>   .. ..- attr(*, "response")= int 1
#>   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "predvars")= language list(mpg, factor(cyl))
#>   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "factor"
#>   .. .. ..- attr(*, "names")= chr [1:2] "mpg" "factor(cyl)"
#>  - attr(*, "factors")= chr "cyl"
#>  - attr(*, "is_subset")= logi FALSE

Aug 26 '22 00:08 vincentarelbundock

No, the factor() function strips labels. I'm not exactly following the use case? My understanding is that the labelled class isn't intended to be something to actually be retained after import--such variables should either be converted to numeric or factor/ordered

Aug 26 '22 04:08 bwiernik

This would be useful to automatically replace variable names by their label in {modelsummary} tables: https://vincentarelbundock.github.io/modelsummary/articles/appearance.html#variable-labels. For now, simple variables in formula can be replaced by their labels but not those wrapped in factor() in the formula

Aug 26 '22 07:08 etiennebacher

I'm not sure about the performance, but we could at this place: https://github.com/easystats/insight/blob/33e54687b04ec85f8b1d0430629f2e47fea5f010/R/utils_get_data.R#L483

recover the data frame the environment, match variable names for those variables that were coerced "on the fly" and then retrieve label-attributes from the recovered data.

Aug 26 '22 08:08 strengejacke

Other possibility (but I don't know if expand.model.frame() is supported by all models):

mtcars$cyl <- haven::labelled(mtcars$cyl, label = "Number of cylinders")
mtcars$hp <- haven::labelled(mtcars$hp, label = "Horsepower")
mtcars$am <- haven::labelled(mtcars$am, label = "Transmission")

mod <- lm(mpg ~ hp + factor(cyl) + factor(am), data = mtcars)

fac <- insight::find_terms(mod)$conditional
fac <- fac[startsWith(fac, "factor(")]
fac <- gsub("^factor\\(", "", fac)
fac <- gsub("\\)$", "", fac)

x <- expand.model.frame(mod, fac)[, fac]

lapply(x, class)
#> $cyl
#> [1] "haven_labelled" "vctrs_vctr"     "double"        
#> 
#> $am
#> [1] "haven_labelled" "vctrs_vctr"     "double"

^{Created on 2022-08-26 by the reprex package (v2.0.1)}

Aug 26 '22 09:08 etiennebacher

I have vague memory that expand.model.frame only works in a very limited set of models.

I'm not sure about the performance

My guess is that the main performance penalty would come from copying/assigning. I wonder if it is possible to retrieve the attribute directly from the environment, by name, without calling eval() or re-assigning. Otherwise, I'm not sure it's worth the performance hit, since this is a relatively minor feature.

Aug 26 '22 11:08 vincentarelbundock

Closing now as this will be solved by the #691

Dec 07 '22 01:12 vincentarelbundock

insight insight copied to clipboard

`get_data()`: labels with `factor()` in model formula

insight
insight copied to clipboard