labelled icon indicating copy to clipboard operation
labelled copied to clipboard

Setting value labels for several variables at once (preferably using dplyr::across)

Open deschen1 opened this issue 3 years ago • 5 comments

Inspired by this question (https://stackoverflow.com/questions/73818355/how-to-recode-values-in-haven-labelled-vectors-in-r) I'm wondering, how one could set value labels for several variables at once. Apparently, dplyr::across doesn't work, but since labelled vectors are kind of a close relative of haven/tidyverse, I'm wondering if it would be possible to implement the possibility to use set_value_labels in a dplyr::mutate. Or at least make it possible to use a tidyselect selection of variables in set_value_labels.

regex:

x <- structure(list(q0015_0001 = structure(c(3, 5, NA, 3, 1, 2, NA, NA, 3, 4, 2, NA, 2, 2, 4, NA,
 4, 3, 3, 3, 3, 2, NA, NA, 2), label = "Menu Options/Variety", format.spss = "F8.2", labels = 
c(`Very Dissatisfied` = 1, Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very Satisfied` = 5), 
class = c("haven_labelled", "vctrs_vctr", "double")), q0015_0002 = structure(c(4, 4, NA, 5, 3, 3, 
NA, NA, 3, 4, 2, NA, 5, 2, 4, NA, 4, 3, 4, 4, 4, 4, NA, NA, 2), label = "Cleanliness", format.spss
 = "F8.2", labels = c(`Very Dissatisfied` = 1, Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very
 Satisfied` = 5), class = c("haven_labelled", "vctrs_vctr", "double")), q0015_0003 = 
structure(c(2, 2, NA, 3, 1, 2, NA, NA, 3, 4, 3, NA, 4, 3, 4, NA, 3, 2, 4, 4, 2, 2, NA, NA, 1),
 label = "Taste and Quality of Food", format.spss = "F8.2", labels = c(`Very Dissatisfied` = 1, 
Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very Satisfied` = 5), class = c("haven_labelled", 
"vctrs_vctr", "double"))), row.names = c(NA, -25L), class = c("tbl_df", "tbl", "data.frame"), 
label = "File created by user")

Doesn't work:

library(labelled)
library(tidyverse)

x |> 
  mutate(across(starts_with("q0015"), 
                ~dplyr::recode(., `1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2))) |> 
  mutate(across(starts_with("q0015"), 
                ~set_value_labels(., c("Very Dissatisfied" = -2, "Dissatisfied" = -1, "Neutral" = 0, "Satisfied" = 1, "Very Satisfied" = 2))))

I also tried different variants of using purrr::map, with no success. Or are the possibly other relative easy solution to set labels for several vars (I rememeber that for VARIABLE labels I used an approach of providing a named list of vars and new var labels, but not sure how that could look for VALUE labels, because each element/variable would have several value lables).

UPDATE:

Got it working with this ugly chunk of code, but wondering if there could be an easier solution:

x |>  
  mutate(across(starts_with("q0015"), 
                ~dplyr::recode(., `1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2))) |> 
  set_value_labels(.labels = rep(list(c("Very Dissatisfied" = -2,
                                        "Dissatisfied" = -1,
                                        "Neutral" = 0,
                                        "Satisfied" = 1,
                                        "Very Satisfied" = 2)),
                                 x |> 
                                   select(starts_with("q0015")) |> 
                                   ncol()) |> 
                     setNames(nm = x |> 
                                select(starts_with("q0015")) |> 
                                names()))

deschen1 avatar Sep 23 '22 06:09 deschen1

set_value_labels() is designed to be applied on a data frame and not on a vector (so it cannot be applied within mutate() or across()).

The easiest way is probably to write your own function to be called in across()

x <- structure(list(
  q0015_0001 = structure(c(
    3, 5, NA, 3, 1, 2, NA, NA, 3, 4, 2, NA, 2, 2, 4, NA,
    4, 3, 3, 3, 3, 2, NA, NA, 2
  ),
  label = "Menu Options/Variety", format.spss = "F8.2", labels =
    c(`Very Dissatisfied` = 1, Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very Satisfied` = 5),
  class = c("haven_labelled", "vctrs_vctr", "double")
  ), q0015_0002 = structure(c(
    4, 4, NA, 5, 3, 3,
    NA, NA, 3, 4, 2, NA, 5, 2, 4, NA, 4, 3, 4, 4, 4, 4, NA, NA, 2
  ),
  label = "Cleanliness",
  format.spss = "F8.2", labels = c(`Very Dissatisfied` = 1, Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very
                                                                                                                                             Satisfied` = 5), class = c("haven_labelled", "vctrs_vctr", "double")
  ), q0015_0003 =
    structure(c(2, 2, NA, 3, 1, 2, NA, NA, 3, 4, 3, NA, 4, 3, 4, NA, 3, 2, 4, 4, 2, 2, NA, NA, 1),
      label = "Taste and Quality of Food", format.spss = "F8.2", labels = c(
        `Very Dissatisfied` = 1,
        Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very Satisfied` = 5
      ), class = c(
        "haven_labelled",
        "vctrs_vctr", "double"
      )
    )
),
row.names = c(NA, -25L), class = c("tbl_df", "tbl", "data.frame"),
label = "File created by user"
)

x
#> # A tibble: 25 × 3
#>    q0015_0001 q0015_0002 q0015_0003
#>    <hvn_lbll> <hvn_lbll> <hvn_lbll>
#>  1          3          4          2
#>  2          5          4          2
#>  3         NA         NA         NA
#>  4          3          5          3
#>  5          1          3          1
#>  6          2          3          2
#>  7         NA         NA         NA
#>  8         NA         NA         NA
#>  9          3          3          3
#> 10          4          4          4
#> # … with 15 more rows

library(labelled)
library(tidyverse)


recode_satisfaction <- function(v) {
  v <- v |> 
    dplyr::recode(`1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2)
  val_labels(v) <- c("Very Dissatisfied" = -2, "Dissatisfied" = -1, "Neutral" = 0, "Satisfied" = 1, "Very Satisfied" = 2)
  v
}

x |> 
  mutate(across(starts_with("q0015"), recode_satisfaction))
#> # A tibble: 25 × 3
#>    q0015_0001             q0015_0002          q0015_0003            
#>    <dbl+lbl>              <dbl+lbl>           <dbl+lbl>             
#>  1  0 [Neutral]            1 [Satisfied]      -1 [Dissatisfied]     
#>  2  2 [Very Satisfied]     1 [Satisfied]      -1 [Dissatisfied]     
#>  3 NA                     NA                  NA                    
#>  4  0 [Neutral]            2 [Very Satisfied]  0 [Neutral]          
#>  5 -2 [Very Dissatisfied]  0 [Neutral]        -2 [Very Dissatisfied]
#>  6 -1 [Dissatisfied]       0 [Neutral]        -1 [Dissatisfied]     
#>  7 NA                     NA                  NA                    
#>  8 NA                     NA                  NA                    
#>  9  0 [Neutral]            0 [Neutral]         0 [Neutral]          
#> 10  1 [Satisfied]          1 [Satisfied]       1 [Satisfied]        
#> # … with 15 more rows

Created on 2022-09-23 with reprex v2.0.2

larmarange avatar Sep 23 '22 10:09 larmarange

Fair enough, that would work. It's an interesting question conceptually, though. I see how for VARIABLE labels it makes only sense to apply them on a data frame level. For value labels, though, I think the option to apply them on a column-level (and conseuqently across several columns) could have some value. If that's a potential future option to add such a feature to the package, that'd be great. If not, I think your custom function workaround makes sense.

deschen1 avatar Sep 23 '22 10:09 deschen1

I will explore the possibility of allowing set_value_labels() to be applied to a vector.

At that stage, I do not want to complexify to much the list of functions with the package

larmarange avatar Sep 23 '22 10:09 larmarange

You may have a look at #127 with extends set_value_labels() and similar verbs to vectors.

larmarange avatar Sep 23 '22 12:09 larmarange

Thank you! At first glance, this works nicely:

Using the reprex from initial post:

# Using the set_value_labels from #127 
test_new <- x |> 
  mutate(across(starts_with("q0015"), 
                ~dplyr::recode(., `1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2))) |> 
  mutate(across(everything(), ~set_value_labels(., .labels = c("Very Dissatisfied" = -2, "Dissatisfied" = -1, "Neutral" = 0, "Satisfied" = 1, "Very Satisfied" = 2))))

# Using set_value_labels from the current CRAN version
test_old <- x |>  
  mutate(across(starts_with("q0015"), 
                ~dplyr::recode(., `1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2))) |> 
  labelled::set_value_labels(.labels = rep(list(c("Very Dissatisfied" = -2,
                                        "Dissatisfied" = -1,
                                        "Neutral" = 0,
                                        "Satisfied" = 1,
                                        "Very Satisfied" = 2)),
                                 x |> 
                                   select(starts_with("q0015")) |> 
                                   ncol()) |> 
                     setNames(nm = x |> 
                                select(starts_with("q0015")) |> 
                                names()))

identical(test_new, test_old)

[1] TRUE

deschen1 avatar Sep 26 '22 07:09 deschen1