panelr icon indicating copy to clipboard operation
panelr copied to clipboard

Explore `sjlabelled` interoperability

Open cschwem2er opened this issue 6 years ago • 9 comments

Hi,

thanks for your work on this amazing package, it certainly looks like a game changer for analyzing panel data in R!

Are you aware of the sj packages by @strengejacke, in particular sjlabelled? They allow to use variable and value labels for data analysis, one of the last features where R is still lacking behind SPSS or Stata. I think it would be extremely valuable if the sj packages and panelr work together nicely, such that for instance value labels would not get lost during reshaping procedures.

cschwem2er avatar May 22 '19 09:05 cschwem2er

While preserving label-attributes is something that has to be addressed inside panelr, more support in packages like sjPlot or ggeffects is already planned. The basis for panelr to work with my other packages is insight, where I already tracked this package (https://github.com/easystats/insight/issues/107).

strengejacke avatar May 22 '19 11:05 strengejacke

If this is gonna happen, I'd like to add the labelled package to the list of packages to take into account. It builds on the haven_labelled and haven_labelled_spss classes introduced by the haven 'tidyverse' package.

I hang out a lot with survey data users, and have seen both haven and @strengejacke packages like sjmisc in use among them.

For another issue about interactions between panelr and other packages, see also #9 about interacting with panel-data packages.

briatte avatar May 22 '19 12:05 briatte

@briatte I think it's not a matter of specific package support - once panelr preserves label attributes when internally transforming data, it should work with both labelled and sjlabelled.

strengejacke avatar May 22 '19 12:05 strengejacke

Alright. I haven't used sjlabelled myself, so am glad to hear that the issue is simpler than I thought it might be :)

P.S. ggeffects is reaaally good. As I discuss in #9, I feel packages like panelr or ggeffects could enrich tidymodels at some point.

briatte avatar May 22 '19 12:05 briatte

I sheepishly confess that I have rarely used these packages.

Here is my try at setting a value label using sjlabelled on a panel_data object.

library(panelr)
library(sjlabelled)

data("WageData")
wages <- panel_data(WageData, id = id, wave = t)

wages <- sjlabelled::set_labels(wages, fem, labels = c(`0` = "male", `1` = "female"))
sjmisc::frq(wages$fem)
# x <numeric> 
# total N=4165  valid N=4165  mean=0.11  sd=0.32
 
 val  label  frq raw.prc valid.prc cum.prc
   0   male 3696   88.74     88.74   88.74
   1 female  469   11.26     11.26  100.00
  NA     NA    0    0.00        NA      NA

Seems like it worked?

jacob-long avatar May 22 '19 14:05 jacob-long

I think what you need to consider are the "label" and "labels" attribute that data have when loaded with the haven package (and when the data file is labelled, i.e. from SPSS or Stata or so).

These attributes are preserved when you fit a model with lm() or lmer(). Here's an example with the efc-dataset, which was imported from SPSS with haven and which has such label-attributes.

data(efc, package = "sjmisc")
m <- lm(neg_c_7 ~ c172code + c161sex, data = efc)
str(model.frame(m))
#> 'data.frame':    833 obs. of  3 variables:
#>  $ neg_c_7 : num  12 20 11 10 12 19 15 11 10 28 ...
#>   ..- attr(*, "label")= chr "Negative impact with 7 items"
#>  $ c172code: num  2 2 1 2 2 2 2 2 2 2 ...
#>   ..- attr(*, "label")= chr "carer's level of education"
#>   ..- attr(*, "labels")= Named num  1 2 3
#>   .. ..- attr(*, "names")= chr  "low level of education" "intermediate level of education" "high level of education"
#>  $ c161sex : num  2 2 1 1 2 1 2 2 2 2 ...
#>   ..- attr(*, "label")= chr "carer's gender"
#>   ..- attr(*, "labels")= Named num  1 2
#>   .. ..- attr(*, "names")= chr  "Male" "Female"
#> (truncated...)

Also, data operation with dplyr, purrr or tidyr preserve these labels. However, base R function often drop the label-attributes.

You actually don't need to rely on the sjlabelled or labelled packages, you just have to ensure that label-attributes from supplied data are not lost on the journey through your package functions. Then, one can apply the functions from sjlabelled or labelled on the returned model.frame() to retrieve term labels.

strengejacke avatar May 22 '19 14:05 strengejacke

Okay, this is partly fixed at least in the very limited example I used.

library(panelr)
library(sjlabelled)

data("WageData")
wages <- panel_data(WageData, id = id, wave = t)

wages <- sjlabelled::set_labels(wages, fem, labels = c(`0` = "male", `1` = "female"))

fit <- wbm(lwage ~ union | fem, data = wages)
str(model.frame(fit))
'data.frame':	4165 obs. of  6 variables:
 $ id          : Factor w/ 595 levels "1","2","3","4",..: 1 1 1 1 1 1 1 2 2 2 ...
 $ t           : num  1 2 3 4 5 6 7 1 2 3 ...
  ..- attr(*, "labels")= Named num  0 1
  .. ..- attr(*, "names")= chr  "male" "female"
 $ lwage       : num  5.56 5.72 6 6 6.06 ...
 $ union       : num  0 0 0 0 0 ...
 $ fem         : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "labels")= Named num  0 1
  .. ..- attr(*, "names")= chr  "male" "female"
 $ imean(union): num  0 0 0 0 0 ...
 - attr(*, "wave")= chr "t"
 - attr(*, "id")= chr "id"
 - attr(*, "periods")= num  1 2 3 4 5 6 7
 - attr(*, "interaction.style")= chr "double-demean"
 - attr(*, "terms")=Classes 'terms', 'formula'  language lwage ~ `imean(union)` + union + fem + (1 + id)
  .. ..- attr(*, "variables")= language list(lwage, `imean(union)`, union, fem, id)
  .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:5] "lwage" "`imean(union)`" "union" "fem" ...
  .. .. .. ..$ : chr [1:4] "`imean(union)`" "union" "fem" "id"
  .. ..- attr(*, "term.labels")= chr [1:4] "`imean(union)`" "union" "fem" "id"
  .. ..- attr(*, "order")= int [1:4] 1 1 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: 0x0000019ce1baa9c8> 
  .. ..- attr(*, "predvars")= language list(lwage, `imean(union)`, union, fem, id)
  .. ..- attr(*, "dataClasses")= Named chr [1:5] "numeric" "numeric" "numeric" "numeric" ...
  .. .. ..- attr(*, "names")= chr [1:5] "lwage" "imean(union)" "union" "fem" ...
  .. ..- attr(*, "predvars.fixed")= language list(lwage, `imean(union)`, union, fem)
  .. ..- attr(*, "varnames.fixed")= chr [1:4] "lwage" "imean(union)" "union" "fem"
  .. ..- attr(*, "predvars.random")= language list(lwage, id)
 - attr(*, "formula")=Class 'formula'  language lwage ~ `imean(union)` + union + fem + (1 | id)
  .. ..- attr(*, ".Environment")=<environment: 0x0000019ce1baa9c8> 

Now unfortunately when I assign the labels to fem in wages, it looks like the labels get copied over to the t variable as well. I suspect this has to do with the way panel_data objects refuse to drop the ID and wave columns.

jacob-long avatar May 22 '19 15:05 jacob-long

I should also add that the desired behavior is less clear for time-varying variables since they end up being transformed (usually).

jacob-long avatar May 22 '19 15:05 jacob-long

FYI. the table layout is not perfect here in GitHub markdown, but first steps are finished for panelr::wmb() support in sjPlot.

library(panelr)
library(sjPlot)
load(url("https://github.com/strengejacke/mixed-models-snippets/raw/master/example.RData"))
pd <- panel_data(d, id = ID, wave = time)
m <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time  | (time | ID), data = pd)
tab_model(m)
Qo L
Predictors Estimates CI p
x tv -3.73 -4.53 – -2.93
(Intercept) 62.73 51.25 – 74.20
imean(x tv) -6.30 -7.31 – -5.28
age -0.22 -0.59 – 0.16 0.262
z 1 ti 4.43 -4.08 – 12.94 0.309
z 2 ti 0.00 -0.00 – 0.00 0.097
time 1.09 -0.20 – 2.39 0.100
Random Effects
s2 142.12
t00 ID 201.95
t11 ID.time 10.82
01 ID -0.77
ICC 0.43
N ID 188
Observations 564
Marginal R2 / Conditional R2 0.382 / 0.649

Created on 2019-05-23 by the reprex package (v0.3.0)

strengejacke avatar May 23 '19 21:05 strengejacke