js-yaml
js-yaml copied to clipboard
Parsing numbers according to YAML 1.2
Based on the spec for YAML 1.2, underscores shouldn't be allowed in numbers:
https://yaml.org/spec/1.2/2009-07-21/spec.html#id2604185
However the latest js-yaml
still allows them:
package.json
{
"dependencies": {
"js-yaml": "^4.1.0"
}
}
test.js
const yaml = require('js-yaml');
const data = yaml.load(`
string: '1_2_3'
also_string: 1_2_3
`);
console.log(typeof data.string);
console.log(typeof data.also_string);
$ node test.js
string
number
Is this intended behaviour? Thanks in advance.
Does it break anything? Are there any strings with numbers and underscores that should stay strings?
found this issue, kinda similar https://github.com/symfony/symfony/issues/27120
Based on the spec for YAML 1.2, underscores shouldn't be allowed in numbers
YAML 1.2 doesn't define how numbers are parsed. As far as YAML is concerned, there're just 3 types: maps, collections and scalars. How they are represented in memory is up to parser/schema. Reference parser wouldn't tell you if it's a number or not.
What you linked to is a recommended default schema. I would be very hesitant to make a subset of it, but superset might be fine for historical/interoperability reasons.
We used to strictly implement https://yaml.org/type/int.html schema (the one 1.2 spec refers to in section 10.4.). Unfortunately, it has problems (people confused sexagesimals with port numbers, and octals didn't align well with ecmascript changes), so it was reduced a bit. But underscores remained from there.
So the questions are:
- how other 1.2 parsers do it?
- is it worth breaking existing documents?
I'm looking at ruamel.yaml (that was one of the first yaml 1.2 parsers I remember), it casts digits with underscores to numbers as well:
$ python3
>>> from ruamel.yaml import YAML
>>> YAML().load("foo: 1_2_3_4")
ordereddict([('foo', 1234)])
- I think the behaviour should reflect either the one from YAML 1.1, or one of the YAML 1.2 schemas. A mixture is unfortunate, and it will never be in sync to what other libraries are doing
- You can use this repo https://perlpunk.github.io/yaml-test-schema/ to test behaviour when you implement the YAML 1.2 Core Schema. (Note that the JSON schema is defined a bit ambiguously)
how other 1.2 parsers do it?
Other YAML 1.2 libraries use the YAML 1.2 Core schema as the default, e.g. https://github.com/eemeli/yaml, https://github.com/perlpunk/YAML-PP-p5
Note that a "parser" doesn't do anything here, it's the constructor. A YAML parser only cares about the syntax.
is it worth breaking existing documents?
Well, it depends on what you do by default. That should stay the same, and people can opt in into a new schema. Or you make an explicit breaking change and default to the Core schema.
I've checked, there are 2 differences with YAML 1.2 core schema left:
- support for underscores in integers/floats (we do, they don't)
- support for binary ints like
0b10101
(we do, they don't)
change for next major version maybe, not sure
@perlpunk do you have any idea why YAML 1.2 spec removed those?
I don't know why binary was dropped.
But for underscores, it turned out that it didn't make sense the way it was defined for 1.1 (and the regular expression would get too complicated if the goal was to fix it), e.g. should .1_
or .
or ._
or even .________
be parsed as a number? The regex says yes.
We use yaml-cpp
which is where we first ran into the conflict: https://github.com/jbeder/yaml-cpp/issues/781 (they don't support underscore)
I'm happy to update the PR to make underscores "opt out". I would just need some guidance on what you want to call the property or how to make a new schema.
I have a concrete case where this breaks things. This OpenAPI spec uses 18_24
as a map key. When parsed with js-yaml, that key becomes 1824
, breaking internal references and making that spec completely unparseable.
The yaml
package from npm parses this correctly (as a string key) but js-yaml does not:
> yaml = require('yaml')
> jsYaml = require('js-yaml')
> yaml.parse('18_24: test')
{ '18_24': 'test' }
> jsYaml.load('18_24: test')
{ '1824': 'test' }