rio
rio copied to clipboard
rio loses "label" attributes if a "labels" attribute also exists when roundtripping to ".sav" and ".dta".
- [x] a possible bug
The title says it all. For some reason, rio doesn't write the variable label attribute to SPSS/Stata files, if there is also a "labels" attribute (value labels). This wasn't always the case, but I can't say what version introduced the bug.
test <- data.frame(x = 1)
attributes(test$x)$label <- "Var"
attributes(test$x)$labels <- c("First" = 1)
attributes(test$x)
#> $label
#> [1] "Var"
#>
#> $labels
#> First
#> 1
rio::export(test, "test.dta")
haven::write_sav(test, "test_haven.sav")
test <- rio::import("test.dta")
attributes(test$x)
#> $format.stata
#> [1] "%10.0g"
#>
#> $labels
#> First
#> 1
test <- rio::import("test_haven.sav")
attributes(test$x)
#> $label
#> [1] "Var"
#>
#> $format.spss
#> [1] "F8.2"
### without labels
test <- data.frame(x = 1)
attributes(test$x)$label <- "Var"
attributes(test$x)
#> $label
#> [1] "Var"
rio::export(test, "test.sav")
haven::write_sav(test, "test_haven.sav")
test <- rio::import("test.sav")
attributes(test$x)
#> $label
#> [1] "Var"
#>
#> $format.spss
#> [1] "F8.2"
test <- rio::import("test_haven.sav")
attributes(test$x)
#> $label
#> [1] "Var"
#>
#> $format.spss
#> [1] "F8.2"
Created on 2020-04-22 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.5.3 Patched (2019-03-11 r77192)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2020-04-22
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.2)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.5.2)
#> callr 3.4.1 2020-01-24 [1] CRAN (R 3.5.3)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.5.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 3.5.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
#> curl 4.3 2019-12-02 [1] CRAN (R 3.5.2)
#> data.table 1.12.8 2019-12-09 [1] CRAN (R 3.5.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.5.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 3.5.2)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.5.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.5.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.5.2)
#> forcats 0.4.0 2019-02-17 [1] CRAN (R 3.5.2)
#> foreign 0.8-75 2020-01-20 [1] CRAN (R 3.5.2)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.2)
#> glue 1.4.0 2020-04-03 [1] CRAN (R 3.5.3)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 3.5.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.5.2)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 3.5.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.5.2)
#> knitr 1.28 2020-02-06 [1] CRAN (R 3.5.2)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.5.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0)
#> openxlsx 4.1.4 2019-12-06 [1] CRAN (R 3.5.2)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.5.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.5.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.5.2)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.5.3)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.5.2)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.5.2)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.5.2)
#> readr 1.3.1 2018-12-21 [1] CRAN (R 3.5.0)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 3.5.2)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.5.2)
#> rio 0.5.16 2018-11-26 [1] CRAN (R 3.5.0)
#> rlang 0.4.5.9000 2020-04-10 [1] Github (r-lib/rlang@a90b04b)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.5.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0)
#> stringi 1.4.5 2020-01-11 [1] CRAN (R 3.5.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 3.5.2)
#> tibble 3.0.0 2020-03-30 [1] CRAN (R 3.5.3)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.5.2)
#> vctrs 0.2.99.9011 2020-04-10 [1] Github (r-lib/vctrs@7736275)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.5.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.5.2)
#> zip 2.0.4 2019-09-01 [1] CRAN (R 3.5.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library
Can you check the version from github please? Thanks for the report!
Hello Thomas Leeper, thanks for your great package. It makes things easy when working with R and SPSS in a team. Unfortunately, I've the same effect as described by rubenarslan. I realized it after updating R to version 4.0.1 (2020-06-06).
By the way, after updating sjlabelled::write_spss does not even export anymore. Maybe this will help narrow down the problem.
Kind regards Thomas
Here is my MWE:
# MWE
# # when there are value labels (attribute "labels"), the variable label (attribute "label") is lost in rio::export to SPSS
rm(list=ls())
ValueLabels <- as.numeric(c(1:3))
NoValueLabels <- as.numeric(c(1:3))
test <- data.frame(NoValueLabels, ValueLabels )
attr(test$NoValueLabels, "label") <- "Variablelabel, but without Valuelabels"
attr(test$ValueLabels, "label") <- "Variablelabel and Valuelabels"
attr(test$ValueLabels, "labels") <- c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3)
rio::export(test, "test.sav")
reimportTest <- rio::import("test.sav")
str(test)
str(reimportTest)
# compare test vs. reimportTest attribute "label":
# when there are value labels (attr "labels"), the variable labels (attr "label") is lost in rio::export to SPSS
#sessionInfo()
Here is my sessioninfo:
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_1.1.0 dplyr_1.0.0 stringr_1.4.0 sjlabelled_1.1.5
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 xfun_0.14 purrr_0.3.4
[4] pander_0.6.3 lattice_0.20-41 haven_2.3.1
[7] tcltk_4.0.1 vctrs_0.3.1 summarytools_0.9.6
[10] generics_0.0.2 htmltools_0.4.0 yaml_2.2.1
[13] base64enc_0.1-3 rlang_0.4.6 pillar_1.4.4
[16] foreign_0.8-80 glue_1.4.1 pryr_0.1.4
[19] readxl_1.3.1 matrixStats_0.56.0 lifecycle_0.2.0
[22] plyr_1.8.6 sjmisc_2.8.5 cellranger_1.1.0
[25] zip_2.0.4 codetools_0.2-16 psych_1.9.12.31
[28] knitr_1.28 rio_0.5.16 forcats_0.5.0
[31] curl_4.3 parallel_4.0.1 Rcpp_1.0.4.6
[34] readr_1.3.1 backports_1.1.7 checkmate_2.0.0
[37] magick_2.3 tmvnsim_1.0-2 rapportools_1.0
[40] mnormt_2.0.0 hms_0.5.3 digest_0.6.25
[43] stringi_1.4.6 openxlsx_4.1.5 insight_0.8.5
[46] grid_4.0.1 tools_4.0.1 magrittr_1.5
[49] tibble_3.0.1 crayon_1.3.4 pkgconfig_2.0.3
[52] ellipsis_0.3.1 data.table_1.12.8 lubridate_1.7.9
[55] rstudioapi_0.11 R6_2.4.1 nlme_3.1-148
[58] compiler_4.0.1
May be this helps. It seems to me that "haven" changed the way the attributes are set. When you write and read the data with haven we have the same effect for the MWE above. But if the labelling uses haven::labelled_spss the labels are kept (as long as you use haven::write_sav).
Here is the extended MWE:
rm(list=ls())
ValueLabels <- as.numeric(c(1:3))
NoValueLabels <- as.numeric(c(1:3))
HavenLabels <- as.numeric(c(1:3))
test <- data.frame(NoValueLabels, ValueLabels, HavenLabels)
attr(test$NoValueLabels, "label") <- "Variablelabel, but without Valuelabels"
attr(test$ValueLabels, "label") <- "Variablelabel and Valuelabels"
attr(test$ValueLabels, "labels") <- c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3)
test$HavenLabels <- haven::labelled_spss(test$HavenLabels, labels=c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3), label = "Variable and Valuelabels with haven" )
rio::export(test, "test.sav")
reimportTest <- rio::import("test.sav")
haven::write_sav(test, "testHaven.sav")
reimportTestHaven <- haven::read_sav("testHaven.sav")
str(test)
str(reimportTest)
str(reimportTestHaven)
Thanks - I'll try to get to this as soon as possible.
Thanks to you. There was a similar issue here, which is solved now: https://github.com/strengejacke/sjlabelled/issues/36
Maybe this helps to find the solution.
Thanks. This is definitely a bug. Working on a fix now.
Just pushed to github - if you have time, let me know if that's now working as expected.
Thanks for your work. Unfortunately, it seems not to work as expected. I used
if (!require("remotes")){ install.packages("remotes") } remotes::install_github("leeper/rio")
to install the latest version of rio. But the variablelabels still vanish.
My bug seems related, though is not about exporting but simply importing a Stata-dataset. Was quite surprised to see that haven::read_dta()
successfully read value labels, whereas rio::import()
does not.
Could you @zahlenzauber please explain to me if this is the expected behavior?
If it is the case, I would consider this issue is resolved.
ValueLabels <- as.numeric(c(1:3))
NoValueLabels <- as.numeric(c(1:3))
test <- data.frame(NoValueLabels, ValueLabels )
attr(test$NoValueLabels, "label") <- "Variablelabel, but without Valuelabels"
attr(test$ValueLabels, "label") <- "Variablelabel and Valuelabels"
attr(test$ValueLabels, "labels") <- c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3)
tempsav1 <- tempfile(fileext = ".sav")
rio::export(test, tempsav1)
test_rio <- rio::import(tempsav1)
str(test_rio)
#> 'data.frame': 3 obs. of 2 variables:
#> $ NoValueLabels: num 1 2 3
#> ..- attr(*, "label")= chr "Variablelabel, but without Valuelabels"
#> ..- attr(*, "format.spss")= chr "F8.2"
#> $ ValueLabels : num 1 2 3
#> ..- attr(*, "label")= chr "Variablelabel and Valuelabels"
#> ..- attr(*, "format.spss")= chr "F8.2"
#> ..- attr(*, "labels")= Named num [1:3] 1 2 3
#> .. ..- attr(*, "names")= chr [1:3] "Valuelabel 1" "Valuelabel 2" "Valuelabel 3"
attr(test_rio$ValueLabels, "labels")
#> Valuelabel 1 Valuelabel 2 Valuelabel 3
#> 1 2 3
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.3.1 reprex_2.0.2 Rcpp_1.0.11 zip_2.3.0
#> [5] readxl_1.4.3 yaml_2.3.7 fastmap_1.1.1 R6_2.5.1
#> [9] readr_2.1.4 curl_5.0.2 openxlsx_4.2.5.2 knitr_1.43
#> [13] forcats_1.0.0 tibble_3.2.1 R.cache_0.16.0 tzdb_0.4.0
#> [17] pillar_1.9.0 R.utils_2.12.2 rlang_1.1.1 utf8_1.2.3
#> [21] stringi_1.7.12 xfun_0.40 fs_1.6.3 cli_3.6.1
#> [25] withr_2.5.0 magrittr_2.0.3 rio_0.5.30 digest_0.6.33
#> [29] haven_2.5.3 hms_1.1.3 lifecycle_1.0.3 R.methodsS3_1.8.2
#> [33] R.oo_1.25.0 vctrs_0.6.3 evaluate_0.21 glue_1.6.2
#> [37] data.table_1.14.8 cellranger_1.1.0 styler_1.10.1 fansi_1.0.4
#> [41] foreign_0.8-82 rmarkdown_2.24 purrr_1.0.2 tools_4.3.1
#> [45] pkgconfig_2.0.3 htmltools_0.5.6
Created on 2023-08-31 with reprex v2.0.2
Dear @chainsawriot thanks a lot! It's great and the str()-output looks as expected now. I wish you a wonderful day
@zahlenzauber Thanks a lot!