r-yaml
r-yaml copied to clipboard
Support for yaml arrays?
Hello, I'm trying to make an R implementation that would output this yaml:
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']
But I can't find some R object that supports the ['blah','blah']
bit. It keeps putting it as individual entries. It doesn't seem to be reversible in import and output either
e.g.
yaml.load("steps:
+ - name: 'gcr.io/cloud-builders/docker'
+ args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']")
#$steps
#$steps[[1]]
#$steps[[1]]$name
#[1] "gcr.io/cloud-builders/docker"
#
#$steps[[1]]$args
#[1] "build" "-t" "gcr.io/[PROJECT_ID]/[IMAGE]"
#[4] "."
yaml.load("steps:
+ - name: 'gcr.io/cloud-builders/docker'
+ args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']") -> foo
as.yaml(foo)
...gives:
steps:
- name: gcr.io/cloud-builders/docker
args:
- build
- -t
- gcr.io/ddd
- '.'
Is the desired output possible?
Ok, its equivalent. https://stackoverflow.com/questions/23657086/yaml-multi-line-arrays
So it will work, its only readability to favour the former.
It appears that what you're asking for is a way to tell the as.yaml
function to print bracket-style sequences (AKA arrays, but 'sequence' is the YAML name for them). As you've discovered, multi-line arrays/sequences are perfectly valid in YAML. The as.yaml
function does not currently have a way to output sequences in the bracket style, but the yaml.load
function will read either format.
Despite their technical equivalence, I would like this feature a lot, too. I think the bracket style is intuitive for R users because it sort of looks like c()
I would like this feature too! I didn't know this wasn't supported and ended up getting several of my yaml files messed up.
Hi would love this feature too, for a big YAML it becomes much much more human readable in array format
Some clarification regarding terminology: YAML basically has two styles: block and flow style. (also see https://www.yaml.info/learn/flowstyle.html)
x:
- block
- style
- sequence
y: [flow, style, sequence]
When emitting, libyaml allows to set the style
of the sequence_start
event, look for YAML_FLOW_SEQUENCE_STYLE
.
Of course, when loading data to a native data structure and then dumping it as YAML again, meta information like style gets lost, unless it is saved in the data structure.
I don't know anything about how r-yaml loads the data, so I can't comment about that.
PyYAML has an option default_flow_style
. It's False
by default. Everything will be output in block style. When set to None
, only the sequences and mappings which only contain scalars will be printed in flow style, and if set to True
, everything will be printed in flow style.
I think that's a good compromise when there is no possibility to save the original style information.
Current branch passes val-grind tests.
Passes tests in data.table as well.
garbetsp@Hubble:~/Projects/cran/data.table$ make test
R -e 'require(data.table); test.data.table()'
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> require(data.table); test.data.table()
Loading required package: data.table
getDTthreads(verbose=TRUE):
OpenMP version (_OPENMP) 201511
omp_get_num_procs() 8
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 2147483647
omp_get_max_threads() 8
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 4 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: /usr/local/lib/R/site-library/data.table/tests/tests.Rraw.bz2
Tue Feb 22 16:22:29 2022 endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ==unset, Sys.timezone()=='America/Chicago', Sys.getlocale()=='LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==8; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==8; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 4 threads with throttle==1024. See ?setDTthreads.', zlibVersion()==1.2.11 ZLIB_VERSION==1.2.11
10 longest running tests took 17s (41% of 41s)
ID time nTest
1: 2155 3.183 5
2: 1438 2.355 738
3: 1888 1.993 9
4: 1648 1.654 91
5: 1848 1.618 2
6: 1652 1.564 91
7: 1650 1.531 91
8: 1437 1.162 36
9: 1912 1.159 2
10: 1644 1.042 91
All 10038 tests (last 2163) in tests/tests.Rraw.bz2 completed ok in 41.3s elapsed (51.8s cpu)
I was curious and looked into https://github.com/vubiostat/r-yaml/commit/276825d5b74bc202784324d1f986a9f09d878a94 and only see sequence_style
mentioned, but also mappings can be in flow style, so they should also be affected.
e.g. a flow mapping in a flow sequence: [ { key: value } ]
or the other way round { key: [values] }
So a parameter sequence_style
in emit_object
sounds like flow style mappings are missing.
Maybe two parameters, default_seq_flow
and default_map_flow
?
Maybe two parameters,
default_seq_flow
anddefault_map_flow
?
I don't think that is necessary.
What I mean is that you don't pass the parameter to the yaml_mapping_start_event_initialize
, only to the yaml_sequence_start_event_initialize
.
I thought that's what this patch does, I don't see it in the changeset getting passed to yaml_mapping_start_event_initialize
. What I was suggesting is to allow independent control of both.
I thought that's what this patch does, I don't see it in the changeset getting passed to yaml_mapping_start_event_initialize.
And I suggested that it should, but with a different name, not "sequence" in it.
If you want to do something completely different than pyyaml, go ahead :) But maybe ask users first what they want.
What I was suggesting is to allow independent control of both.
It doesn't make sense to emit all sequences in flow and all mappings in block style. everything under flow style nodes must be flow style itself.
:lol: I never try to make sense of user style requests...
This shows what you're talking about https://gist.github.com/perlpunk/377c3a537df861a7736fd3a1b9aec04f
The proposal is for the parameter default_flow_style
to control both sequence and map emitting:
- FALSE => (default) block mode, passes YAML_ANY_SEQUENCE_STYLE and YAML_ANY_MAPPING_STYLE (current behavior)
- TRUE => flow mode, passes YAML_FLOW_SEQUENCE_STYLE and YAML_FLOW_MAPPING_STYLE
- NA =>
- Leaf Nodes: YAML_FLOW_SEQUENCE_STYLE and YAML_FLOW_MAPPING_STYLE for leaf nodes
- Branch Nodes: YAML_BLOCK_SEQUENCE_STYLE and YAML_BLOCK_MAPPING_STYLE
@MarkEdmondson1234 Can you foresee any reason one would want this controlled separately, i.e. handle lists/maps one way and arrays a different way?