csvy
csvy copied to clipboard
[feature request] YAML header as a list
Hi,
This is a great package!
I was pretty surprised to find that csvy::get_yaml_header gives a character vector and not a list. It might be a good idea to provide a list output to do something with it. (or a different function altogether say csvy::get_header?)
pacman::p_load("magrittr")
csvy::write_csvy(iris, "iris.csvy")
# this gives a character vector, not readable!
csvy::get_yaml_header("iris.csvy")
#> [1] "profile: tabular-data-package" "name: iris"
#> [3] "fields:" "- name: Sepal.Length"
#> [5] " type: number" "- name: Sepal.Width"
#> [7] " type: number" "- name: Petal.Length"
#> [9] " type: number" "- name: Petal.Width"
#> [11] " type: number" "- name: Species"
#> [13] " type: string" " levels:"
#> [15] " - setosa" " - versicolor"
#> [17] " - virginica" "--- "
# meatadata is a recursive structure, a list might be better
metadata_list <- csvy::get_yaml_header("iris.csvy") %>%
textConnection() %>%
yaml::read_yaml()
metadata_list
#> $profile
#> [1] "tabular-data-package"
#>
#> $name
#> [1] "iris"
#>
#> $fields
#> $fields[[1]]
#> $fields[[1]]$name
#> [1] "Sepal.Length"
#>
#> $fields[[1]]$type
#> [1] "number"
#>
#>
#> $fields[[2]]
#> $fields[[2]]$name
#> [1] "Sepal.Width"
#>
#> $fields[[2]]$type
#> [1] "number"
#>
#>
#> $fields[[3]]
#> $fields[[3]]$name
#> [1] "Petal.Length"
#>
#> $fields[[3]]$type
#> [1] "number"
#>
#>
#> $fields[[4]]
#> $fields[[4]]$name
#> [1] "Petal.Width"
#>
#> $fields[[4]]$type
#> [1] "number"
#>
#>
#> $fields[[5]]
#> $fields[[5]]$name
#> [1] "Species"
#>
#> $fields[[5]]$type
#> [1] "string"
#>
#> $fields[[5]]$levels
#> [1] "setosa" "versicolor" "virginica"
Created on 2018-12-18 by the reprex package (v0.2.0).
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.1 (2018-07-02)
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz Asia/Kolkata
#> date 2018-12-18
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.3 2018-12-14 cran (@1.1.3)
#> base * 3.5.1 2018-07-05 local
#> compiler 3.5.1 2018-07-05 local
#> csvy 0.3.0 2018-12-18 Github (leeper/csvy@af0aa8d)
#> data.table 1.11.8 2018-09-30 cran (@1.11.8)
#> datasets * 3.5.1 2018-07-05 local
#> devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
#> digest 0.6.18 2018-10-10 cran (@0.6.18)
#> evaluate 0.11 2018-07-17 CRAN (R 3.5.0)
#> graphics * 3.5.1 2018-07-05 local
#> grDevices * 3.5.1 2018-07-05 local
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
#> jsonlite 1.5 2017-06-01 CRAN (R 3.5.0)
#> knitr 1.20 2018-02-20 CRAN (R 3.5.0)
#> magrittr * 1.5 2014-11-22 CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
#> methods * 3.5.1 2018-07-05 local
#> pacman 0.4.6 2017-05-14 CRAN (R 3.5.0)
#> Rcpp 0.12.19 2018-10-01 cran (@0.12.19)
#> rmarkdown 1.10 2018-06-11 CRAN (R 3.5.0)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
#> stats * 3.5.1 2018-07-05 local
#> stringi 1.2.4 2018-07-20 CRAN (R 3.5.0)
#> stringr 1.3.1 2018-05-10 CRAN (R 3.5.0)
#> tools 3.5.1 2018-07-05 local
#> utils * 3.5.1 2018-07-05 local
#> withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
#> yaml 2.2.0 2018-07-25 CRAN (R 3.5.0)
I was also surprised that there was no function for this. I am using this to get the metadata in a list form -- code based on code in read_csvy. md is the metadata as read from the file, md_list is after it is processed with yaml and md_vec is a character vector of column types whose names are the column names.
library(yaml)
md <- get_yaml_header("df.csvy")
md_list <- yaml.load(paste(md, collapse = "\n"))
md_vec <- sapply(md_list$fields, function(x) setNames(x[[2]], x[[1]]))