csvy icon indicating copy to clipboard operation
csvy copied to clipboard

Incorrect values when reading csvy

Open Lukas-Novak opened this issue 3 years ago • 1 comments

Please specify whether your issue is about:

  • [x] a possible bug
  • [ ] a question about package functionality
  • [ ] a suggested code or documentation change, improvement to the code, or feature request

Labeling numeric variable with expss package results in incorrect values when reading csvy:

library(dpplyr)
library(expss)
library(csvy)

# creating labeled df
cars.labeled  <- mtcars %>% 
  mutate(cyl = as.numeric(cyl),
         disp = as.numeric(disp),
         vs = recode_factor(vs,
                            "0" = "No",
                            "1" = "Yes")
         ) %>% 
  expss::apply_labels(
    cyl = "How many cilinders") %>% 
  write_csvy("cars.labeled.csv")

# reading csvy file, setting "stringsAsFactors" as TRUE because I want to treat them as factors
 cars.imported <-  read_csvy("cars.labeled.csv", stringsAsFactors = T)

# in the labeled df values are fine:
cars.labeled$cyl %>% summary()

# however, in imported df, values do not match to labeled df
cars.imported$cyl %>% summary()


## session info:
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.9

loaded via a namespace (and not attached):
 [1] pillar_1.8.0      compiler_4.2.1    remotes_2.4.2     tools_4.2.1       digest_0.6.29     googledrive_2.0.0 jsonlite_1.8.0    evaluate_0.15    
 [9] lifecycle_1.0.1   gargle_1.2.0      tibble_3.1.8      pkgconfig_2.0.3   rlang_1.0.4       csvy_0.3.0        DBI_1.1.3         cli_3.3.0        
[17] rstudioapi_0.13   yaml_2.3.5        curl_4.3.2        xfun_0.31         fastmap_1.1.0     stringr_1.4.0     knitr_1.39        withr_2.5.0      
[25] httr_1.4.3        generics_0.1.3    fs_1.5.2          vctrs_0.4.1       askpass_1.1       hms_1.1.1         rappdirs_0.3.3    tidyselect_1.1.2 
[33] data.table_1.14.2 glue_1.6.2        R6_2.5.1          fansi_1.0.3       rmarkdown_2.14    tzdb_0.3.0        readr_2.1.2       purrr_0.3.4      
[41] tidyr_1.2.0       magrittr_2.0.3    ellipsis_0.3.2    htmltools_0.5.3   MASS_7.3-57       assertthat_0.2.1  utf8_1.2.2        stringi_1.7.8    
[49] openssl_2.0.2    

Lukas-Novak avatar Aug 21 '22 12:08 Lukas-Novak

read_csvy does not use the fields metadata at all when reading columns: https://github.com/leeper/csvy/blob/master/R/read_csvy.R#L68C1-L71C6

Cedev avatar Jul 06 '23 19:07 Cedev