pyyaml
pyyaml copied to clipboard
Handle top level yaml property `on`
A document containing a top-level key named on
is rendered as a boolean value instead of the literal string on
when loading and dumping a document.
If a top level yaml property matches the regex for the resolver,
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:bool',
re.compile(r'''^(?:yes|Yes|YES|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF)$''', re.X),
list('yYnNtTfFoO'))
the boolean value after compiling is rendered in yaml output literally. So not just on
, but words like off
and no
as well.
For example,
import yaml
tmpl = """on:
push:
branches: [ main ]
"""
print(yaml.dump(yaml.load(tmpl, Loader=yaml.Loader)))
will output,
true:
push:
branches:
- main
I would expect,
on:
push:
branches:
- main
but the word on
is replaced with a boolean value true
.
I'm using pyyaml==6.0
with Python 3.10.9.
Update: It appears any key that is a boolean value will be parsed as a boolean. For example,
no:
push:
branches:
- main
also renders as,
false:
push:
branches:
- main
I tracked down the issue to the regex in the BaseResolver.resolve function. Here below is the resolve function with a hack that solves this particular issue (just to point out the area of interest). It makes sense now why on
and off
and no
are parsed as booleans. They match that regex. I added a comment in the function below for how I "patched" the issue.
def resolve(self, kind, value, implicit):
if kind is ScalarNode and implicit[0]:
if value == '':
resolvers = self.yaml_implicit_resolvers.get('', [])
else:
resolvers = self.yaml_implicit_resolvers.get(value[0], [])
wildcard_resolvers = self.yaml_implicit_resolvers.get(None, [])
for tag, regexp in resolvers + wildcard_resolvers:
# RIGHT HERE I added the additional condition to exclude "on"
if regexp.match(value) and value != "on":
return tag
implicit = implicit[1]
if self.yaml_path_resolvers:
exact_paths = self.resolver_exact_paths[-1]
if kind in exact_paths:
return exact_paths[kind]
if None in exact_paths:
return exact_paths[None]
if kind is ScalarNode:
return self.DEFAULT_SCALAR_TAG
elif kind is SequenceNode:
return self.DEFAULT_SEQUENCE_TAG
elif kind is MappingNode:
return self.DEFAULT_MAPPING_TAG
This is probably not a viable solution, so if anyone has any thoughts on how to fix it I would look into submitting a PR.
Yeah, sadly the way the implicit resolver setup works today against the default resolver class at import-time:
https://github.com/yaml/pyyaml/blob/957ae4d495cf8fcb5475c6c2f1bce801096b68a5/lib/yaml/resolver.py#L170-L175
... it's tricky to robustly work around this particular 1.1ism at runtime without monkeypatching.
If you just want a local patch to make it only implicitly recognize 1.2 booleans, that's pretty easy- swap out the existing bool resolver in the above location with this one:
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:bool',
re.compile(r'^(?:true|false)$', re.X),
list('tf'))
Monkeypatching the default resolver at runtime is a little harder, but not terrible (and much cheaper than what you're doing now)- just clear out the implicit resolver dispatch table and repopulate with only the ones you want, eg:
import re
import yaml
from yaml.resolver import Resolver
# zap the Resolver class' internal dispatch table
Resolver.yaml_implicit_resolvers={}
# note the 1.2 bool impl here
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:bool',
re.compile(r'^(?:true|false)$', re.X),
list('tf'))
# and now the rest of the default implicit resolvers
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:float',
re.compile(r'''^(?:[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+][0-9]+)?
|\.[0-9][0-9_]*(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
|[-+]?\.(?:inf|Inf|INF)
|\.(?:nan|NaN|NAN))$''', re.X),
list('-+0123456789.'))
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:int',
re.compile(r'''^(?:[-+]?0b[0-1_]+
|[-+]?0[0-7_]+
|[-+]?(?:0|[1-9][0-9_]*)
|[-+]?0x[0-9a-fA-F_]+
|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$''', re.X),
list('-+0123456789'))
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:merge',
re.compile(r'^(?:<<)$'),
['<'])
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:null',
re.compile(r'''^(?: ~
|null|Null|NULL
| )$''', re.X),
['~', 'n', 'N', ''])
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:timestamp',
re.compile(r'''^(?:[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]
|[0-9][0-9][0-9][0-9] -[0-9][0-9]? -[0-9][0-9]?
(?:[Tt]|[ \t]+)[0-9][0-9]?
:[0-9][0-9] :[0-9][0-9] (?:\.[0-9]*)?
(?:[ \t]*(?:Z|[-+][0-9][0-9]?(?::[0-9][0-9])?))?)$''', re.X),
list('0123456789'))
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:value',
re.compile(r'^(?:=)$'),
['='])
# The following resolver is only for documentation purposes. It cannot work
# because plain scalars cannot start with '!', '&', or '*'.
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:yaml',
re.compile(r'^(?:!|&|\*)$'),
list('!&*'))
print(yaml.safe_load('hi_mom: on'))
We have grand plans to make this kind of thing way easier with the 1.2 schema config support, but life keeps getting in the way... :(
Feel free to grab this hack, but also no guarantees that it'll work forever- after all it is reaching deep into the guts and current implementation details of the default resolver :wink:
I ran into this problem. My quick fix was to change on
in my yaml to 'on'
.
See https://github.com/yaml/pyyaml/issues/486 for all related issues regarding YAML 1.2 support.
You can use the following project on top of PyYAML for YAML 1.2 support: https://pypi.org/project/yamlcore/
>>> import yamlcore
>>> print(yaml.dump(yaml.load(tmpl, Loader=yamlcore.CoreLoader), Dumper=yamlcore.CoreDumper))
on:
push:
branches:
- main
Also see https://perlpunk.github.io/yaml-test-schema/schemas.html