panelr
panelr copied to clipboard
Explore `sjlabelled` interoperability
Hi,
thanks for your work on this amazing package, it certainly looks like a game changer for analyzing panel data in R!
Are you aware of the sj packages by @strengejacke, in particular sjlabelled? They allow to use variable and value labels for data analysis, one of the last features where R is still lacking behind SPSS or Stata. I think it would be extremely valuable if the sj packages and panelr work together nicely, such that for instance value labels would not get lost during reshaping procedures.
While preserving label-attributes is something that has to be addressed inside panelr, more support in packages like sjPlot or ggeffects is already planned. The basis for panelr to work with my other packages is insight, where I already tracked this package (https://github.com/easystats/insight/issues/107).
If this is gonna happen, I'd like to add the labelled package to the list of packages to take into account. It builds on the haven_labelled and haven_labelled_spss classes introduced by the haven 'tidyverse' package.
I hang out a lot with survey data users, and have seen both haven and @strengejacke packages like sjmisc in use among them.
For another issue about interactions between panelr and other packages, see also #9 about interacting with panel-data packages.
@briatte I think it's not a matter of specific package support - once panelr preserves label attributes when internally transforming data, it should work with both labelled and sjlabelled.
Alright. I haven't used sjlabelled myself, so am glad to hear that the issue is simpler than I thought it might be :)
P.S. ggeffects is reaaally good. As I discuss in #9, I feel packages like panelr or ggeffects could enrich tidymodels at some point.
I sheepishly confess that I have rarely used these packages.
Here is my try at setting a value label using sjlabelled on a panel_data object.
library(panelr)
library(sjlabelled)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
wages <- sjlabelled::set_labels(wages, fem, labels = c(`0` = "male", `1` = "female"))
sjmisc::frq(wages$fem)
# x <numeric>
# total N=4165 valid N=4165 mean=0.11 sd=0.32
val label frq raw.prc valid.prc cum.prc
0 male 3696 88.74 88.74 88.74
1 female 469 11.26 11.26 100.00
NA NA 0 0.00 NA NA
Seems like it worked?
I think what you need to consider are the "label" and "labels" attribute that data have when loaded with the haven package (and when the data file is labelled, i.e. from SPSS or Stata or so).
These attributes are preserved when you fit a model with lm() or lmer(). Here's an example with the efc-dataset, which was imported from SPSS with haven and which has such label-attributes.
data(efc, package = "sjmisc")
m <- lm(neg_c_7 ~ c172code + c161sex, data = efc)
str(model.frame(m))
#> 'data.frame': 833 obs. of 3 variables:
#> $ neg_c_7 : num 12 20 11 10 12 19 15 11 10 28 ...
#> ..- attr(*, "label")= chr "Negative impact with 7 items"
#> $ c172code: num 2 2 1 2 2 2 2 2 2 2 ...
#> ..- attr(*, "label")= chr "carer's level of education"
#> ..- attr(*, "labels")= Named num 1 2 3
#> .. ..- attr(*, "names")= chr "low level of education" "intermediate level of education" "high level of education"
#> $ c161sex : num 2 2 1 1 2 1 2 2 2 2 ...
#> ..- attr(*, "label")= chr "carer's gender"
#> ..- attr(*, "labels")= Named num 1 2
#> .. ..- attr(*, "names")= chr "Male" "Female"
#> (truncated...)
Also, data operation with dplyr, purrr or tidyr preserve these labels. However, base R function often drop the label-attributes.
You actually don't need to rely on the sjlabelled or labelled packages, you just have to ensure that label-attributes from supplied data are not lost on the journey through your package functions. Then, one can apply the functions from sjlabelled or labelled on the returned model.frame() to retrieve term labels.
Okay, this is partly fixed at least in the very limited example I used.
library(panelr)
library(sjlabelled)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
wages <- sjlabelled::set_labels(wages, fem, labels = c(`0` = "male", `1` = "female"))
fit <- wbm(lwage ~ union | fem, data = wages)
str(model.frame(fit))
'data.frame': 4165 obs. of 6 variables:
$ id : Factor w/ 595 levels "1","2","3","4",..: 1 1 1 1 1 1 1 2 2 2 ...
$ t : num 1 2 3 4 5 6 7 1 2 3 ...
..- attr(*, "labels")= Named num 0 1
.. ..- attr(*, "names")= chr "male" "female"
$ lwage : num 5.56 5.72 6 6 6.06 ...
$ union : num 0 0 0 0 0 ...
$ fem : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "labels")= Named num 0 1
.. ..- attr(*, "names")= chr "male" "female"
$ imean(union): num 0 0 0 0 0 ...
- attr(*, "wave")= chr "t"
- attr(*, "id")= chr "id"
- attr(*, "periods")= num 1 2 3 4 5 6 7
- attr(*, "interaction.style")= chr "double-demean"
- attr(*, "terms")=Classes 'terms', 'formula' language lwage ~ `imean(union)` + union + fem + (1 + id)
.. ..- attr(*, "variables")= language list(lwage, `imean(union)`, union, fem, id)
.. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:5] "lwage" "`imean(union)`" "union" "fem" ...
.. .. .. ..$ : chr [1:4] "`imean(union)`" "union" "fem" "id"
.. ..- attr(*, "term.labels")= chr [1:4] "`imean(union)`" "union" "fem" "id"
.. ..- attr(*, "order")= int [1:4] 1 1 1 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: 0x0000019ce1baa9c8>
.. ..- attr(*, "predvars")= language list(lwage, `imean(union)`, union, fem, id)
.. ..- attr(*, "dataClasses")= Named chr [1:5] "numeric" "numeric" "numeric" "numeric" ...
.. .. ..- attr(*, "names")= chr [1:5] "lwage" "imean(union)" "union" "fem" ...
.. ..- attr(*, "predvars.fixed")= language list(lwage, `imean(union)`, union, fem)
.. ..- attr(*, "varnames.fixed")= chr [1:4] "lwage" "imean(union)" "union" "fem"
.. ..- attr(*, "predvars.random")= language list(lwage, id)
- attr(*, "formula")=Class 'formula' language lwage ~ `imean(union)` + union + fem + (1 | id)
.. ..- attr(*, ".Environment")=<environment: 0x0000019ce1baa9c8>
Now unfortunately when I assign the labels to fem in wages, it looks like the labels get copied over to the t variable as well. I suspect this has to do with the way panel_data objects refuse to drop the ID and wave columns.
I should also add that the desired behavior is less clear for time-varying variables since they end up being transformed (usually).
FYI. the table layout is not perfect here in GitHub markdown, but first steps are finished for panelr::wmb() support in sjPlot.
library(panelr)
library(sjPlot)
load(url("https://github.com/strengejacke/mixed-models-snippets/raw/master/example.RData"))
pd <- panel_data(d, id = ID, wave = time)
m <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time | (time | ID), data = pd)
tab_model(m)
| Qo L | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| x tv | -3.73 | -4.53 – -2.93 | |
| (Intercept) | 62.73 | 51.25 – 74.20 | |
| imean(x tv) | -6.30 | -7.31 – -5.28 | |
| age | -0.22 | -0.59 – 0.16 | 0.262 |
| z 1 ti | 4.43 | -4.08 – 12.94 | 0.309 |
| z 2 ti | 0.00 | -0.00 – 0.00 | 0.097 |
| time | 1.09 | -0.20 – 2.39 | 0.100 |
| Random Effects | |||
| s2 | 142.12 | ||
| t00 ID | 201.95 | ||
| t11 ID.time | 10.82 | ||
| 01 ID | -0.77 | ||
| ICC | 0.43 | ||
| N ID | 188 | ||
| Observations | 564 | ||
| Marginal R2 / Conditional R2 | 0.382 / 0.649 | ||
Created on 2019-05-23 by the reprex package (v0.3.0)