pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

Support for the YAML 1.2 Core and JSON schemas [Take 2]

Open perlpunk opened this issue 2 years ago • 12 comments

Supersedes #512

This is a draft and subject to discussion. See also https://github.com/yaml/pyyaml/issues/486

(For #512: Thanks to @SUSE for another hackweek! I had four days of work time dedicated to an open source project of my choice. https://hackweek.suse.com/20/projects/yaml-1-dot-2-schema-support-for-pyyaml) Thanks to @SUSE for a volunteer day I used to make this continuation to my previous PR.

This PR depends on #483

Introduction

For a quick overview of the schema changes between YAML 1.1 and 1.2, look here: https://perlpunk.github.io/yaml-test-schema/schemas.html

While also the syntax was changed in YAML 1.2, this pull request is about the schema changes. As an example, in 1.1, Y, yes, NO, on etc. are resolved as booleans in 1.1. This sounds convenient, but also means that all these 22 different strings must be quoted if they are not meant as booleans. A very common obstacle is the country code for Norway, NO ("Norway Problem"). In YAML 1.2 this was improved by reducing the list of boolean representations.

Also other types have been improved. The 1.1 regular expression for float allows . and ._ as floats, although there isn't a single digit in these strings.

While the 1.2 Core Schema, the recommended default for 1.2, still allows a few variations (true, True and TRUE, etc.), the 1.2 JSON Schema is there to match JSON behaviour regarding types, so it allows only true and false.

Current State

PyYAML implements the 1.1 types (with a few changes like leaving out the single character booleans y, Y etc.), and it was never updated to support one of the 1.2 Schemas.

Problem

Besides the above mentioned problems with the 1.1 types, more and more libraries are created or updated for YAML 1.2, probably also thanks to the relatively new YAML Test Suite, and PyYAML should be able to read and write YAML files used or produced by other libraries.

This PR

The PyYAML Safeloader, which is currently the most recommended Loader if you don't need special behaviour, implements YAML 1.1 types.

I added tagsets for yaml11, json, core. This way people can try out a YAML 1.2 Loader with little code:

    class MyCoreLoader(yaml.BaseLoader): pass
    class MyCoreDumper(yaml.CommonDumper): pass
    MyCoreLoader.init_tags('core')
    MyCoreDumper.init_tags('core')
    yaml.load(y, Loader=MyCoreLoader)

Out of Scope

One problem is that PyYAML's callbacks are class based, and while I was able to make the code a bit more compact via a dictionary of types/callbacks, there are still method calls which must be in a certain class. The !!merge << key for example needs special handling.

That way it's tedious to add custom Loaders. Turning the class based approach into an instance based is on our wishlist.

One example use case we have in mind is, that you want to use the 1.2 CoreLoader, but on top of that you want it to recognize timestamps and mergekeys. Or you want a very basic loader that should treat everything as a string except booleans and null.

Example

        import yaml
    
        class MyCoreLoader(yaml.BaseLoader): pass
        class MyJSONLoader(yaml.BaseLoader): pass
        class MyCoreDumper(yaml.CommonDumper): pass
        class MyJSONDumper(yaml.CommonDumper): pass
    
        MyCoreLoader.init_tags('core')
        MyJSONLoader.init_tags('json')
    
        MyCoreDumper.init_tags('core')
        MyJSONDumper.init_tags('json')
    
        input = """
        - TRUE
        - yes
        - ~
        - true
        #- .inf
        #- 23
        #- #empty
        #- !!str #empty
        #- 010
        #- 0o10
        #- 0b100
        #- 0x20
        #- -0x20
        #- 1_000
        #- 3:14
        #- 0011
        #- +0
        #- 0001.23
        #- !!str +0.3e3
        #- +0.3e3
        #- &x foo
        #- *x
        #- 1e27
        #- 1x+27
        """
    
        print('--------------------------------------------- BaseLoader')
        data = yaml.load(input, Loader=yaml.BaseLoader)
        print(data)
        print('--------------------------------------------- SafeLoader')
        data = yaml.load(input, Loader=yaml.SafeLoader)
        print(data)
        print('--------------------------------------------- CoreLoader')
        data = yaml.load(input, Loader=MyCoreLoader)
        print(data)
        print('--------------------------------------------- JSONLoader')
        data = yaml.load(input, Loader=MyJSONLoader)
        print(data)
    
        print('--------------------------------------------- SafeDumper')
        out = yaml.dump(data, Dumper=yaml.SafeDumper)
        print(out)
        print('--------------------------------------------- MyCoreDumper')
        out = yaml.dump(data, Dumper=MyCoreDumper)
        print(out)
        print('--------------------------------------------- MyJSONDumper')
        out = yaml.dump(data, Dumper=MyJSONDumper)
        print(out)

perlpunk avatar Sep 22 '21 15:09 perlpunk

any updates on this?

shelper avatar Oct 27 '21 14:10 shelper

@perlpunk anything we can do to help push this along?

kislyuk avatar Jul 21 '22 20:07 kislyuk

I wonder when this will be merged...

ssbarnea avatar Feb 14 '23 16:02 ssbarnea

@ingydotnet

perlpunk avatar Feb 14 '23 16:02 perlpunk

@ssbarnea I'll bring up the task of putting out a new pyyaml release, with the release team.

I suspect this would be merged into the next release.

ingydotnet avatar Feb 14 '23 17:02 ingydotnet

@ingydotnet 6.0.1 is out, can we have an eta on the next major/minor release that'll merge this pr?

SubaruArai avatar Aug 15 '23 08:08 SubaruArai

We've discussed a November release.

ingydotnet avatar Aug 15 '23 14:08 ingydotnet

@ingydotnet Is that November target still realistic? Based on the progress so far I becoming quite pessimistic... Now working on replacing most of pyyaml use with ruamel.yaml in ansible-lint.

What worries me the most is that the change was never merged as such code change is also expected to need some time to mature before the release, preferably involving some pre-releases. Still, until is merged to development branch, there is no hope that code would be improved, and instead more likely to be affected by code rot.

ssbarnea avatar Oct 04 '23 12:10 ssbarnea

@ssbarnea The first week of November is the time that we all agreed to look into putting out the next release. That is when we expect to look at tickets like this.

ingydotnet avatar Oct 04 '23 20:10 ingydotnet

@ingydotnet May I ask for clarification on the schedule? If my understandings are correct, this looks like to be the current schedule:

  • 2023/Nov/03 (end of 1st week of November) Discuss the potential for merging this PR
  • 2023/Nov - (some day) Resolve the merge conflict, people can test on an experimental branch
  • (some other day) Next release, including this PR

SubaruArai avatar Oct 19 '23 08:10 SubaruArai

The contents of this PR are included in #700; we've spent much of this past week iterating on that locally in preparation for an upcoming PyYAML 7.0.0a1 in the next couple of weeks (an update/replacement to #700 should happen today with the recent changes).

nitzmahone avatar Nov 10 '23 18:11 nitzmahone

I just created a yamlcore package that allows you to use YAML 1.2 Core Tags on top of the PyYAML BaseLoader. As this PR is blocked on the API redesign, I decided to create something that users can use today already. Feedback welcome, it is my first package on pypi :)

perlpunk avatar Apr 20 '24 18:04 perlpunk