pyyaml
pyyaml copied to clipboard
Support overriding anchors
In YAML 1.1 and 1.2, anchors can be reused. https://yaml.org/spec/1.2/spec.html#id2786196 https://yaml.org/spec/1.1/#id863390
See also https://github.com/yaml/pyyaml/issues/100 and https://github.com/yaml/pyyaml/issues/334
This allows:
- &anchor A
- *anchor
- &anchor B
- *anchor
# [A, A, B, B]
This change also resolves a long time bug in the pyyaml package in Debian:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=515634
Thanks.
Some background on this:
There is a discussion that repeated anchors should be forbidden in the next version of YAML (AFAIK mainly because it makes roundtripping with keeping anchor names more complicated). So @ingydotnet wants to wait until the decision is made if this feature will be removed or not.
I think it's a valid point to say, we shouldn't start supporting something that will be forbidden in the next YAML version anyway. But IMHO the reason for not allowing this (complicated roundtripping of anchor names) is not really valid, especially if the next YAML version still considers anchor names as throwaway content by default.
What would probably help the discussion is to see some real world use cases where this is useful. I haven't need this feature myself so far, but I can imagine use cases.
I have currently the following nice-to-have use case (docker-compose.yml
):
version: 3
services:
&service myfancycontainer:
environment:
- SERVICENAME: *service
&service anotherfancycontainer:
environment:
- SERVICENAME: *service
This is a simple docker-compose sample that forwards the service names to the containers as environment variables. Of course it could be implemented like this:
version: 3
services:
&service myfancycontainer:
environment:
- SERVICENAME: *service
&service2 anotherfancycontainer:
environment:
- SERVICENAME: *service2
But it is cleaner with overriding anchor names. Even more cleaner if you have in mind that this compose file is in constant growth and more services are added over time by the administrator.
@perlpunk I'd be interested in reading through this discussions if it happens to be linkable. Where I work we're making heavy use of anchors, going so far as to use this branch for our CI/CD testing because reusing anchors makes DRY go even further (py-test and especially tavern-ci). Thanks.
@yamlCase actually the public process of RFCs has just started these days. This is the related RFC: https://github.com/yaml/yaml-spec/pull/65 Note that the Pull request for the RFC itself is not where discussion happens, but once the RFC PR is merged, we can create issues discussing each RFC. You can of course comment there anyway if you want.
I needed this fixed now so I have monkey patched the change into my current library. Not an ideal situation but maybe others would like this as a work around as we wait:
import yaml
# A monkey patch to fix an anchor bug in the pyaml parser details here: https://github.com/yaml/pyyaml/pull/394
def compose_node_fix_394(self, parent, index):
if self.check_event(yaml.events.AliasEvent):
event = self.get_event()
anchor = event.anchor
if anchor not in self.anchors:
raise ComposerError(None, None, "found undefined alias %r"
% anchor.encode('utf-8'), event.start_mark)
return self.anchors[anchor]
event = self.peek_event()
anchor = event.anchor
# REMOVED BUG FROM HERE
self.descend_resolver(parent, index)
if self.check_event(yaml.events.ScalarEvent):
node = self.compose_scalar_node(anchor)
elif self.check_event(yaml.events.SequenceStartEvent):
node = self.compose_sequence_node(anchor)
elif self.check_event(yaml.events.MappingStartEvent):
node = self.compose_mapping_node(anchor)
self.ascend_resolver()
return node
yaml.composer.Composer.compose_node = compose_node_fix_394 # Monkey patch in the fixed yaml parser
# Demo it working:
if(yaml.load("""- &foo bar
- *foo
- &foo baz
- *foo""", Loader=yaml.Loader) == ["bar","bar","baz","baz"]):
print("Hooray, bug is patched!")
@perlpunk we are using yaml
anchors/aliases quite a bit to cut down on tons of repeated stanzas in our prometheus configuration files. I'd really like to not have to uniquely identify all of these aliases!
...
- <<: *scrape-template
job_name: job_1
sd_configs:
- &common-values
project: some-project
filter: (afilter)
port: 1111
- <<: &common-values
zone: zone-a
- <<: &common-values
zone: zone-b
- <<: &common-values
zone: zone-c
- <<: *scrape-template
job_name: job_2
sd_configs:
- &common-values
project: some-project
filter: (bfilter)
port: 2222
- <<: &common-values
zone: zone-a
- <<: &common-values
zone: zone-b
- <<: &common-values
zone: zone-c
...
What's the plan for this? Do I need to switch to ruamel.yaml instead?
There is a discussion that repeated anchors should be forbidden in the next version of YAML (AFAIK mainly because it makes roundtripping with keeping anchor names more complicated). So @ingydotnet wants to wait until the decision is made if this feature will be removed or not.
Does recent closing of https://github.com/yaml/yaml-spec/pull/65 mean that repeated anchors will not be forbidden? So this pull request can be merged?
+1 on this - this seems to start to cause the problems with certain k8s manifests, especially knative eventing 0.24
Do we have a timeline when it could land in release?
Note that in addition to 1.2 (mentioned in the OP), per the YAML 1.1 spec, "anchors need not be unique" https://yaml.org/spec/1.1/#id863390. It would be great to get this PR merged in.
rebased and retargeted to release/6.0
We had a meeting and discussed this. I added an option reuse_anchors
that defaults to False
, like requested.
Any update on this?
It is considered for the next release 6.1, I believe
This is planned to be a non-default option in PyYAML 6.1.
See https://github.com/yaml/pyyaml/projects/9
No ETA for when this will be released.
If this will default to false does it mean that parser will keep failing if yaml contains duplicated anchors like here unless it is explicitly enabled? https://github.com/ansible-collections/kubernetes.core/issues/189
Note that in addition to 1.2 (mentioned in the OP), per the YAML 1.1 spec, "anchors need not be unique" https://yaml.org/spec/1.1/#id863390. It would be great to get this PR merged in.
Not even that, they were legal even in 1.0 with almost exactly the same wording: https://yaml.org/spec/1.0/#id2558599. This has been valid for as long as YAML has existed (not counting any pre-1.0 of which I am unaware and for which a spec has not been preserved on yaml.org). In that context, making it default to invalid, even if it can be enabled, seems an odd choice.
I agree that it should just be allowed. But adding an option for it was the compromise between allowing it and not changing anything at all.
And because there are plans to restructure the API for loading and dumping, this even has to wait until that is decided (because load
doesn't have any options at all yet)...
wondering if this is gonna be merged?
I definitely want this in v.next, but especially if that ends up being a major release (which I assume it will be if we include 1.2 support and some form of declarative Loader/Dumper config to light it up ala #700), I don't see why we wouldn't just turn it on by default... Document it as a "sorta breaking change that actually brings us into better spec compliance that probably nobody will hit on purpose", and maybe don't even provide a knob to shut it off (or if we do, implement it in the Loader config, not on any of the top-level APIs).
Thoughts?