r-yaml Support for yaml arrays?

Hello, I'm trying to make an R implementation that would output this yaml:

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']

But I can't find some R object that supports the ['blah','blah'] bit. It keeps putting it as individual entries. It doesn't seem to be reversible in import and output either

e.g.

yaml.load("steps:
+ - name: 'gcr.io/cloud-builders/docker'
+   args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']")
#$steps
#$steps[[1]]
#$steps[[1]]$name
#[1] "gcr.io/cloud-builders/docker"
#
#$steps[[1]]$args
#[1] "build"                       "-t"                          "gcr.io/[PROJECT_ID]/[IMAGE]"
#[4] "."                          

yaml.load("steps:
+ - name: 'gcr.io/cloud-builders/docker'
+   args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']") -> foo
as.yaml(foo)

...gives:

steps:
- name: gcr.io/cloud-builders/docker
  args:
  - build
  - -t
  - gcr.io/ddd
  - '.'

Is the desired output possible?

Nov 09 '19 12:11 MarkEdmondson1234

Ok, its equivalent. https://stackoverflow.com/questions/23657086/yaml-multi-line-arrays

So it will work, its only readability to favour the former.

Nov 09 '19 12:11 MarkEdmondson1234

It appears that what you're asking for is a way to tell the as.yaml function to print bracket-style sequences (AKA arrays, but 'sequence' is the YAML name for them). As you've discovered, multi-line arrays/sequences are perfectly valid in YAML. The as.yaml function does not currently have a way to output sequences in the bracket style, but the yaml.load function will read either format.

Nov 11 '19 04:11 viking

Despite their technical equivalence, I would like this feature a lot, too. I think the bracket style is intuitive for R users because it sort of looks like c()

Nov 11 '19 16:11 malcolmbarrett

I would like this feature too! I didn't know this wasn't supported and ended up getting several of my yaml files messed up.

Sep 11 '20 08:09 mister-frostee

Hi would love this feature too, for a big YAML it becomes much much more human readable in array format

May 16 '21 14:05 danbartl

Some clarification regarding terminology: YAML basically has two styles: block and flow style. (also see https://www.yaml.info/learn/flowstyle.html)

x:
- block
- style
- sequence
y: [flow, style, sequence]

When emitting, libyaml allows to set the style of the sequence_start event, look for YAML_FLOW_SEQUENCE_STYLE. Of course, when loading data to a native data structure and then dumping it as YAML again, meta information like style gets lost, unless it is saved in the data structure. I don't know anything about how r-yaml loads the data, so I can't comment about that.

PyYAML has an option default_flow_style. It's False by default. Everything will be output in block style. When set to None, only the sequences and mappings which only contain scalars will be printed in flow style, and if set to True, everything will be printed in flow style. I think that's a good compromise when there is no possibility to save the original style information.

Feb 15 '22 16:02 perlpunk

Current branch passes val-grind tests.

Feb 22 '22 22:02 spgarbet

Passes tests in data.table as well.


garbetsp@Hubble:~/Projects/cran/data.table$ make test
R -e 'require(data.table); test.data.table()'

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(data.table); test.data.table()
Loading required package: data.table
getDTthreads(verbose=TRUE):
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            8
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          8
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 4 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: /usr/local/lib/R/site-library/data.table/tests/tests.Rraw.bz2 

Tue Feb 22 16:22:29 2022  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ==unset, Sys.timezone()=='America/Chicago', Sys.getlocale()=='LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==8; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==8; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 4 threads with throttle==1024. See ?setDTthreads.', zlibVersion()==1.2.11 ZLIB_VERSION==1.2.11
10 longest running tests took 17s (41% of 41s)
      ID  time nTest
 1: 2155 3.183     5
 2: 1438 2.355   738
 3: 1888 1.993     9
 4: 1648 1.654    91
 5: 1848 1.618     2
 6: 1652 1.564    91
 7: 1650 1.531    91
 8: 1437 1.162    36
 9: 1912 1.159     2
10: 1644 1.042    91
All 10038 tests (last 2163) in tests/tests.Rraw.bz2 completed ok in 41.3s elapsed (51.8s cpu)

Feb 22 '22 22:02 spgarbet

I was curious and looked into https://github.com/vubiostat/r-yaml/commit/276825d5b74bc202784324d1f986a9f09d878a94 and only see sequence_style mentioned, but also mappings can be in flow style, so they should also be affected. e.g. a flow mapping in a flow sequence: [ { key: value } ] or the other way round { key: [values] } So a parameter sequence_style in emit_object sounds like flow style mappings are missing.

Feb 23 '22 12:02 perlpunk

Maybe two parameters, default_seq_flow and default_map_flow?

Feb 23 '22 14:02 spgarbet

Maybe two parameters, default_seq_flow and default_map_flow?

I don't think that is necessary. What I mean is that you don't pass the parameter to the yaml_mapping_start_event_initialize, only to the yaml_sequence_start_event_initialize.

Feb 23 '22 15:02 perlpunk

I thought that's what this patch does, I don't see it in the changeset getting passed to yaml_mapping_start_event_initialize. What I was suggesting is to allow independent control of both.

Feb 23 '22 15:02 spgarbet

I thought that's what this patch does, I don't see it in the changeset getting passed to yaml_mapping_start_event_initialize.

And I suggested that it should, but with a different name, not "sequence" in it.

If you want to do something completely different than pyyaml, go ahead :) But maybe ask users first what they want.

Feb 23 '22 15:02 perlpunk

What I was suggesting is to allow independent control of both.

It doesn't make sense to emit all sequences in flow and all mappings in block style. everything under flow style nodes must be flow style itself.

Feb 23 '22 15:02 perlpunk

:lol: I never try to make sense of user style requests...

Feb 23 '22 15:02 spgarbet

This shows what you're talking about https://gist.github.com/perlpunk/377c3a537df861a7736fd3a1b9aec04f

The proposal is for the parameter default_flow_style to control both sequence and map emitting:

FALSE => (default) block mode, passes YAML_ANY_SEQUENCE_STYLE and YAML_ANY_MAPPING_STYLE (current behavior)
TRUE => flow mode, passes YAML_FLOW_SEQUENCE_STYLE and YAML_FLOW_MAPPING_STYLE
NA =>
- Leaf Nodes: YAML_FLOW_SEQUENCE_STYLE and YAML_FLOW_MAPPING_STYLE for leaf nodes
- Branch Nodes: YAML_BLOCK_SEQUENCE_STYLE and YAML_BLOCK_MAPPING_STYLE

@MarkEdmondson1234 Can you foresee any reason one would want this controlled separately, i.e. handle lists/maps one way and arrays a different way?

Feb 23 '22 16:02 spgarbet

r-yaml r-yaml copied to clipboard

Support for yaml arrays?

r-yaml
r-yaml copied to clipboard