panelr Explore `sjlabelled` interoperability

Hi,

thanks for your work on this amazing package, it certainly looks like a game changer for analyzing panel data in R!

Are you aware of the sj packages by @strengejacke, in particular sjlabelled? They allow to use variable and value labels for data analysis, one of the last features where R is still lacking behind SPSS or Stata. I think it would be extremely valuable if the sj packages and panelr work together nicely, such that for instance value labels would not get lost during reshaping procedures.

May 22 '19 09:05 cschwem2er

While preserving label-attributes is something that has to be addressed inside panelr, more support in packages like sjPlot or ggeffects is already planned. The basis for panelr to work with my other packages is insight, where I already tracked this package (https://github.com/easystats/insight/issues/107).

May 22 '19 11:05 strengejacke

If this is gonna happen, I'd like to add the labelled package to the list of packages to take into account. It builds on the haven_labelled and haven_labelled_spss classes introduced by the haven 'tidyverse' package.

I hang out a lot with survey data users, and have seen both haven and @strengejacke packages like sjmisc in use among them.

For another issue about interactions between panelr and other packages, see also #9 about interacting with panel-data packages.

May 22 '19 12:05 briatte

@briatte I think it's not a matter of specific package support - once panelr preserves label attributes when internally transforming data, it should work with both labelled and sjlabelled.

May 22 '19 12:05 strengejacke

Alright. I haven't used sjlabelled myself, so am glad to hear that the issue is simpler than I thought it might be :)

P.S. ggeffects is reaaally good. As I discuss in #9, I feel packages like panelr or ggeffects could enrich tidymodels at some point.

May 22 '19 12:05 briatte

I sheepishly confess that I have rarely used these packages.

Here is my try at setting a value label using sjlabelled on a panel_data object.

library(panelr)
library(sjlabelled)

data("WageData")
wages <- panel_data(WageData, id = id, wave = t)

wages <- sjlabelled::set_labels(wages, fem, labels = c(`0` = "male", `1` = "female"))
sjmisc::frq(wages$fem)

# x <numeric> 
# total N=4165  valid N=4165  mean=0.11  sd=0.32
 
 val  label  frq raw.prc valid.prc cum.prc
   0   male 3696   88.74     88.74   88.74
   1 female  469   11.26     11.26  100.00
  NA     NA    0    0.00        NA      NA

Seems like it worked?

May 22 '19 14:05 jacob-long

I think what you need to consider are the "label" and "labels" attribute that data have when loaded with the haven package (and when the data file is labelled, i.e. from SPSS or Stata or so).

These attributes are preserved when you fit a model with lm() or lmer(). Here's an example with the efc-dataset, which was imported from SPSS with haven and which has such label-attributes.

data(efc, package = "sjmisc")
m <- lm(neg_c_7 ~ c172code + c161sex, data = efc)
str(model.frame(m))
#> 'data.frame':    833 obs. of  3 variables:
#>  $ neg_c_7 : num  12 20 11 10 12 19 15 11 10 28 ...
#>   ..- attr(*, "label")= chr "Negative impact with 7 items"
#>  $ c172code: num  2 2 1 2 2 2 2 2 2 2 ...
#>   ..- attr(*, "label")= chr "carer's level of education"
#>   ..- attr(*, "labels")= Named num  1 2 3
#>   .. ..- attr(*, "names")= chr  "low level of education" "intermediate level of education" "high level of education"
#>  $ c161sex : num  2 2 1 1 2 1 2 2 2 2 ...
#>   ..- attr(*, "label")= chr "carer's gender"
#>   ..- attr(*, "labels")= Named num  1 2
#>   .. ..- attr(*, "names")= chr  "Male" "Female"
#> (truncated...)

Also, data operation with dplyr, purrr or tidyr preserve these labels. However, base R function often drop the label-attributes.

You actually don't need to rely on the sjlabelled or labelled packages, you just have to ensure that label-attributes from supplied data are not lost on the journey through your package functions. Then, one can apply the functions from sjlabelled or labelled on the returned model.frame() to retrieve term labels.

May 22 '19 14:05 strengejacke

Okay, this is partly fixed at least in the very limited example I used.

library(panelr)
library(sjlabelled)

data("WageData")
wages <- panel_data(WageData, id = id, wave = t)

wages <- sjlabelled::set_labels(wages, fem, labels = c(`0` = "male", `1` = "female"))

fit <- wbm(lwage ~ union | fem, data = wages)
str(model.frame(fit))

'data.frame':	4165 obs. of  6 variables:
 $ id          : Factor w/ 595 levels "1","2","3","4",..: 1 1 1 1 1 1 1 2 2 2 ...
 $ t           : num  1 2 3 4 5 6 7 1 2 3 ...
  ..- attr(*, "labels")= Named num  0 1
  .. ..- attr(*, "names")= chr  "male" "female"
 $ lwage       : num  5.56 5.72 6 6 6.06 ...
 $ union       : num  0 0 0 0 0 ...
 $ fem         : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "labels")= Named num  0 1
  .. ..- attr(*, "names")= chr  "male" "female"
 $ imean(union): num  0 0 0 0 0 ...
 - attr(*, "wave")= chr "t"
 - attr(*, "id")= chr "id"
 - attr(*, "periods")= num  1 2 3 4 5 6 7
 - attr(*, "interaction.style")= chr "double-demean"
 - attr(*, "terms")=Classes 'terms', 'formula'  language lwage ~ `imean(union)` + union + fem + (1 + id)
  .. ..- attr(*, "variables")= language list(lwage, `imean(union)`, union, fem, id)
  .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:5] "lwage" "`imean(union)`" "union" "fem" ...
  .. .. .. ..$ : chr [1:4] "`imean(union)`" "union" "fem" "id"
  .. ..- attr(*, "term.labels")= chr [1:4] "`imean(union)`" "union" "fem" "id"
  .. ..- attr(*, "order")= int [1:4] 1 1 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: 0x0000019ce1baa9c8> 
  .. ..- attr(*, "predvars")= language list(lwage, `imean(union)`, union, fem, id)
  .. ..- attr(*, "dataClasses")= Named chr [1:5] "numeric" "numeric" "numeric" "numeric" ...
  .. .. ..- attr(*, "names")= chr [1:5] "lwage" "imean(union)" "union" "fem" ...
  .. ..- attr(*, "predvars.fixed")= language list(lwage, `imean(union)`, union, fem)
  .. ..- attr(*, "varnames.fixed")= chr [1:4] "lwage" "imean(union)" "union" "fem"
  .. ..- attr(*, "predvars.random")= language list(lwage, id)
 - attr(*, "formula")=Class 'formula'  language lwage ~ `imean(union)` + union + fem + (1 | id)
  .. ..- attr(*, ".Environment")=<environment: 0x0000019ce1baa9c8>

Now unfortunately when I assign the labels to fem in wages, it looks like the labels get copied over to the t variable as well. I suspect this has to do with the way panel_data objects refuse to drop the ID and wave columns.

May 22 '19 15:05 jacob-long

I should also add that the desired behavior is less clear for time-varying variables since they end up being transformed (usually).

May 22 '19 15:05 jacob-long

FYI. the table layout is not perfect here in GitHub markdown, but first steps are finished for panelr::wmb() support in sjPlot.

library(panelr)
library(sjPlot)
load(url("https://github.com/strengejacke/mixed-models-snippets/raw/master/example.RData"))
pd <- panel_data(d, id = ID, wave = time)
m <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time  | (time | ID), data = pd)
tab_model(m)

	Qo L
Predictors	Estimates	CI	p
x tv	-3.73	-4.53 – -2.93
(Intercept)	62.73	51.25 – 74.20
imean(x tv)	-6.30	-7.31 – -5.28
age	-0.22	-0.59 – 0.16	0.262
z 1 ti	4.43	-4.08 – 12.94	0.309
z 2 ti	0.00	-0.00 – 0.00	0.097
time	1.09	-0.20 – 2.39	0.100
Random Effects
s²	142.12
t₀₀ _ID	201.95
t₁₁ _ID.time	10.82
₀₁ _ID	-0.77
ICC	0.43
N _ID	188
Observations	564
Marginal R² / Conditional R²	0.382 / 0.649

^{Created on 2019-05-23 by the reprex package (v0.3.0)}

May 23 '19 21:05 strengejacke

panelr panelr copied to clipboard

Explore `sjlabelled` interoperability

panelr
panelr copied to clipboard