r-yaml icon indicating copy to clipboard operation
r-yaml copied to clipboard

yaml 1.2 support

Open lorenzwalthert opened this issue 4 years ago • 33 comments

It would be great if that package could support yaml 1.2, which apparently the definite version and is already 10 years old now (https://yaml.org/spec/). Changes are nicely summarized here: https://perlpunk.github.io/slides.tpcig2018/yamlpp/slide022.html. Unexpected behavior as in https://github.com/viking/r-yaml/issues/5 (see the many cross-references) would wanish.

This would imply to turn away from libyaml because it does not seem they are going to implement that spec soon, an issue pending for more than two years: https://github.com/yaml/libyaml/issues/20. A popular alternative to libyaml in C++ would be https://github.com/jbeder/yaml-cpp, but there are many, also in python.

lorenzwalthert avatar Jul 23 '19 18:07 lorenzwalthert

You're right in that it appears that the libyaml team seems unmotivated to add 1.2 support. It may be worth switching over to a different library. I'd want to think carefully about the benefits of doing so, however.

viking avatar Sep 09 '19 18:09 viking

Up

It would simplify any automated workflow if YAML header (%YAML 1.2) could be read (or even just ignored) by read_yaml() Like a for read.csv(file, skip=2)

arthurldp avatar Sep 30 '20 12:09 arthurldp

Upgrading to YAML 1.2 is a pretty big undertaking. It basically means rewriting most of the package, since it will require changing the underlying YAML library. If I were to do that, I'd probably create a new package called 'yaml12' or something like that. I'm slowly edging my way to doing that though.

viking avatar Sep 30 '20 17:09 viking

The libyaml folks are tracking tickets on github now: https://github.com/yaml/libyaml/issues/20

spgarbet avatar Jan 11 '21 17:01 spgarbet

I've taken over maintenance for this package. I'm considering a fork since it looks like libyaml supports 1.2. The package name proposal is yaml12. Thoughts?

spgarbet avatar Jan 28 '22 23:01 spgarbet

@spgarbet if you do make a new package, I'd love to have the opportunity to review before you release to CRAN so we could ensure that we close a few security risks (e.g.) ensuring that eval.expr = FALSE by default.

hadley avatar Jan 31 '22 15:01 hadley

@hadley Jeremy has moved on to private industry and I'm picking this up as maintainer. I'll look into the eval.expr=FALSE issue.

spgarbet avatar Jan 31 '22 16:01 spgarbet

@hadley I pushed up a new version with eval.expr=FALSE by default. I'd like to fix the UTF issue before pushing to CRAN.

spgarbet avatar Feb 09 '22 20:02 spgarbet

@spgarbet you're changing this package? I'm not sure that's a good idea because it has the potential to break existing code. I was only suggesting changing the default if you do yaml12.

hadley avatar Feb 10 '22 15:02 hadley

The plan was always to at some point change the default to FALSE. The current CRAN version spits out a warning that this will happen in a future version. If it's too disruptive I can defer this patch.

spgarbet avatar Feb 10 '22 15:02 spgarbet

Ah in that case you're probably ok.

hadley avatar Feb 10 '22 17:02 hadley

It's that difficult intersection of security vs. disruption. I'd prefer to avoid disruption at the highest levels, conversely security needs must be addressed.

spgarbet avatar Feb 10 '22 18:02 spgarbet

There is another library with almost the same interface call, libfyaml. It fully supports 1.2 and is prepping for full 1.3 support. Work has begun to see what it would take to implement that as the core dependency. It's not looking too bad. The existing library is actually close to being 1.2 compliant.

spgarbet avatar Feb 14 '22 23:02 spgarbet

Oddly this request is quite close. There is only a couple issue left in the libyaml library before it has full 1.2 support.

spgarbet avatar Feb 17 '22 20:02 spgarbet

I've split the project over to https://github.com/vubiostat/yaml12

spgarbet avatar Oct 17 '22 19:10 spgarbet

This project seems as close as it's ever going to get to 1.2 using the current libyaml.

spgarbet avatar Oct 18 '22 14:10 spgarbet

@spgarbet What is the current state of YAML 1.2 support in this package? I'm asking because latest stable version 2.3.7 still does write_yaml() logicals as yes/no instead of true/false.

(I know I can use a custom handler for logicals to work around this but I think the default should be compliant to the latest YAML spec out-of-the-box.)

salim-b avatar May 26 '23 19:05 salim-b

The library depends on an upstream project libyaml. That author has stated he's done what he could to get it as close to 1.2 as possible. I submitted a couple PR and he wasn't interested in them, so I quit trying. I do have a shell of this project and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2. I've not had time recently to work on it. If you'd like a go, I can give you the code and we could get a project going.

spgarbet avatar May 29 '23 15:05 spgarbet

Thanks for the answer!

I do have a shell of this project and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2. I've not had time recently to work on it.

You're referring to this repo, I suppose?

If you'd like a go, I can give you the code and we could get a project going.

I don't know if I'd be of much use... I don't really have experience in C.

salim-b avatar May 29 '23 15:05 salim-b

The library depends on an upstream project libyaml. That author has stated he's done what he could to get it as close to 1.2 as possible.

Who do you mean here? The original author @xitology?

I submitted a couple PR and he wasn't interested in them, so I quit trying.

I can't find any PRs from you that are related to YAML 1.2 support. And again, who do you mean with "he" here?

perlpunk avatar May 29 '23 15:05 perlpunk

Note that the handling of yes/no, true/false etc. is nothing which is done by libyaml. libyaml is just a parser and emitter, the scalar values are all strings.

perlpunk avatar May 29 '23 15:05 perlpunk

and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2.

I'm currently writing a libfyaml binding for perl. The library is passing all 1.2 tests, and it hides its internals and provides a stable API, and the author is still around and developing it, so yes, it is a good idea to use it. Visit us on matrix if you have questions about libfyaml. I think you were already there at some point.

perlpunk avatar May 29 '23 15:05 perlpunk

Well the PR's weren't directly related to 1.2, so but I didn't see it as fruitful to make much change.

I didn't realize the yes/no was on this side (I inherited maintenance of this code and haven't fully explored it). If that's the case I can change it quickly. I'm out of town till June. However, making a change like this is huge in downstream dependencies.

Suggestion has been to move to another package yaml12 as the name. The data.table team has a set of test cases that deeply test the package. If we wish to switch this over to the main package name yaml then we'd have to get approval from Hadley Wickham and the data.table team.

spgarbet avatar May 29 '23 15:05 spgarbet

I pushed up what seems like a really simple change (DOH!) to make this work, the test cases are now failing.

@salim-b Can you install the branch origin/issue-75-yaml12 and tell me if it works for you? I can get the tests changed for this pretty quickly.

spgarbet avatar May 29 '23 16:05 spgarbet

@perlpunk I'm still lurking there. Just been very busy with a more pressing couple projects.

spgarbet avatar May 29 '23 16:05 spgarbet

Note that for full support of the YAML 1.2 schema you need a bit more than just the yes/no boolean thing. Here you can find a pretty complete test data set: https://github.com/perlpunk/yaml-test-schema 1.1/1.2 Comparison: https://perlpunk.github.io/yaml-test-schema/schemas.html Test data visualized: https://perlpunk.github.io/yaml-test-schema/data.html

perlpunk avatar May 29 '23 17:05 perlpunk

I pushed up what seems like a really simple change (DOH!) to make this work, the test cases are now failing.

@salim-b Can you install the branch origin/issue-75-yaml12 and tell me if it works for you? I can get the tests changed for this pretty quickly.

Nope, same result.

remotes::install_github("vubiostat/r-yaml@issue-75-yaml12")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'yaml' from a github remote, the SHA1 (1aa405f3) has not changed since last install.
#>   Use `force = TRUE` to force installation
    yaml::as.yaml(list(a = TRUE, b = FALSE))
#> [1] "a: yes\nb: no\n"

Created on 2023-05-29 with reprex v2.0.2

salim-b avatar May 29 '23 18:05 salim-b

Alright, I'm off the road and back in town. I can solve this here in a bit. It's doable.

I'm thinking for rollout, it could be controlled via a flag, i.e. "yaml12=FALSE" by default. Then it's one library and folks can switch over as needed.

spgarbet avatar Jun 01 '23 16:06 spgarbet

I made a quick attempt at this. The core parse tree specified by "re" I altered. This is a scary change for such a used package. I don't see and easy way to make it flag dependent.

spgarbet avatar Jun 01 '23 16:06 spgarbet

> library(yaml)
> yaml::as.yaml(list(a = TRUE, b = FALSE))
[1] "a: true\nb: false\n"
> yaml::as.yaml(list(a = "yes", b = FALSE))
[1] "a: yes\nb: false\n"

spgarbet avatar Jun 01 '23 16:06 spgarbet