pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

Support overriding anchors

Open perlpunk opened this issue 4 years ago • 21 comments

In YAML 1.1 and 1.2, anchors can be reused. https://yaml.org/spec/1.2/spec.html#id2786196 https://yaml.org/spec/1.1/#id863390

See also https://github.com/yaml/pyyaml/issues/100 and https://github.com/yaml/pyyaml/issues/334

This allows:

- &anchor A
- *anchor
- &anchor B
- *anchor
# [A, A, B, B]

perlpunk avatar Apr 09 '20 23:04 perlpunk

This change also resolves a long time bug in the pyyaml package in Debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=515634

Thanks.

kitterma avatar Apr 15 '20 22:04 kitterma

Some background on this:

There is a discussion that repeated anchors should be forbidden in the next version of YAML (AFAIK mainly because it makes roundtripping with keeping anchor names more complicated). So @ingydotnet wants to wait until the decision is made if this feature will be removed or not.

I think it's a valid point to say, we shouldn't start supporting something that will be forbidden in the next YAML version anyway. But IMHO the reason for not allowing this (complicated roundtripping of anchor names) is not really valid, especially if the next YAML version still considers anchor names as throwaway content by default.

What would probably help the discussion is to see some real world use cases where this is useful. I haven't need this feature myself so far, but I can imagine use cases.

perlpunk avatar May 21 '20 12:05 perlpunk

I have currently the following nice-to-have use case (docker-compose.yml):

version: 3

services:
  &service myfancycontainer:
    environment:
    - SERVICENAME: *service 

  &service anotherfancycontainer:
    environment:
    - SERVICENAME: *service 

This is a simple docker-compose sample that forwards the service names to the containers as environment variables. Of course it could be implemented like this:

version: 3

services:
  &service  myfancycontainer:
    environment:
    - SERVICENAME: *service 

  &service2 anotherfancycontainer:
    environment:
    - SERVICENAME: *service2 

But it is cleaner with overriding anchor names. Even more cleaner if you have in mind that this compose file is in constant growth and more services are added over time by the administrator.

zokradonh avatar Jun 13 '20 13:06 zokradonh

@perlpunk I'd be interested in reading through this discussions if it happens to be linkable. Where I work we're making heavy use of anchors, going so far as to use this branch for our CI/CD testing because reusing anchors makes DRY go even further (py-test and especially tavern-ci). Thanks.

YAMLcase avatar Aug 06 '20 17:08 YAMLcase

@yamlCase actually the public process of RFCs has just started these days. This is the related RFC: https://github.com/yaml/yaml-spec/pull/65 Note that the Pull request for the RFC itself is not where discussion happens, but once the RFC PR is merged, we can create issues discussing each RFC. You can of course comment there anyway if you want.

perlpunk avatar Aug 06 '20 18:08 perlpunk

I needed this fixed now so I have monkey patched the change into my current library. Not an ideal situation but maybe others would like this as a work around as we wait:

import yaml

# A monkey patch to fix an anchor bug in the pyaml parser details here: https://github.com/yaml/pyyaml/pull/394
def compose_node_fix_394(self, parent, index):
    if self.check_event(yaml.events.AliasEvent):
        event = self.get_event()
        anchor = event.anchor
        if anchor not in self.anchors:
            raise ComposerError(None, None, "found undefined alias %r"
                    % anchor.encode('utf-8'), event.start_mark)
        return self.anchors[anchor]
    event = self.peek_event()
    anchor = event.anchor
    # REMOVED BUG FROM HERE
    self.descend_resolver(parent, index)
    if self.check_event(yaml.events.ScalarEvent):
        node = self.compose_scalar_node(anchor)
    elif self.check_event(yaml.events.SequenceStartEvent):
        node = self.compose_sequence_node(anchor)
    elif self.check_event(yaml.events.MappingStartEvent):
        node = self.compose_mapping_node(anchor)
    self.ascend_resolver()
    return node

yaml.composer.Composer.compose_node = compose_node_fix_394 # Monkey patch in the fixed yaml parser

# Demo it working:
if(yaml.load("""- &foo bar
- *foo
- &foo baz
- *foo""", Loader=yaml.Loader) == ["bar","bar","baz","baz"]):
    print("Hooray, bug is patched!")

cbrinker avatar Oct 27 '20 21:10 cbrinker

@perlpunk we are using yaml anchors/aliases quite a bit to cut down on tons of repeated stanzas in our prometheus configuration files. I'd really like to not have to uniquely identify all of these aliases!

...
- <<: *scrape-template
  job_name: job_1
  sd_configs:
  - &common-values
    project: some-project
    filter: (afilter)
    port: 1111
  - <<: &common-values
    zone: zone-a
  - <<: &common-values
    zone: zone-b
  - <<: &common-values
    zone: zone-c
- <<: *scrape-template
  job_name: job_2
  sd_configs:
  - &common-values
    project: some-project
    filter: (bfilter)
    port: 2222
  - <<: &common-values
    zone: zone-a
  - <<: &common-values
    zone: zone-b
  - <<: &common-values
    zone: zone-c
...

cbrinker avatar Oct 28 '20 00:10 cbrinker

What's the plan for this? Do I need to switch to ruamel.yaml instead?

dgarozzo avatar Jun 30 '21 20:06 dgarozzo

There is a discussion that repeated anchors should be forbidden in the next version of YAML (AFAIK mainly because it makes roundtripping with keeping anchor names more complicated). So @ingydotnet wants to wait until the decision is made if this feature will be removed or not.

Does recent closing of https://github.com/yaml/yaml-spec/pull/65 mean that repeated anchors will not be forbidden? So this pull request can be merged?

zokradonh avatar Jul 02 '21 08:07 zokradonh

+1 on this - this seems to start to cause the problems with certain k8s manifests, especially knative eventing 0.24

Do we have a timeline when it could land in release?

RafalSkolasinski avatar Jul 29 '21 09:07 RafalSkolasinski

Note that in addition to 1.2 (mentioned in the OP), per the YAML 1.1 spec, "anchors need not be unique" https://yaml.org/spec/1.1/#id863390. It would be great to get this PR merged in.

cowboy avatar Aug 30 '21 17:08 cowboy

rebased and retargeted to release/6.0

perlpunk avatar Sep 22 '21 18:09 perlpunk

We had a meeting and discussed this. I added an option reuse_anchors that defaults to False, like requested.

perlpunk avatar Sep 23 '21 17:09 perlpunk

Any update on this?

RafalSkolasinski avatar Oct 15 '21 19:10 RafalSkolasinski

It is considered for the next release 6.1, I believe

perlpunk avatar Oct 15 '21 20:10 perlpunk

This is planned to be a non-default option in PyYAML 6.1.

See https://github.com/yaml/pyyaml/projects/9

No ETA for when this will be released.

ingydotnet avatar Oct 15 '21 20:10 ingydotnet

If this will default to false does it mean that parser will keep failing if yaml contains duplicated anchors like here unless it is explicitly enabled? https://github.com/ansible-collections/kubernetes.core/issues/189

RafalSkolasinski avatar Oct 15 '21 20:10 RafalSkolasinski

Note that in addition to 1.2 (mentioned in the OP), per the YAML 1.1 spec, "anchors need not be unique" https://yaml.org/spec/1.1/#id863390. It would be great to get this PR merged in.

Not even that, they were legal even in 1.0 with almost exactly the same wording: https://yaml.org/spec/1.0/#id2558599. This has been valid for as long as YAML has existed (not counting any pre-1.0 of which I am unaware and for which a spec has not been preserved on yaml.org). In that context, making it default to invalid, even if it can be enabled, seems an odd choice.

qwertystop avatar Oct 25 '21 19:10 qwertystop

I agree that it should just be allowed. But adding an option for it was the compromise between allowing it and not changing anything at all. And because there are plans to restructure the API for loading and dumping, this even has to wait until that is decided (because load doesn't have any options at all yet)...

perlpunk avatar Oct 25 '21 21:10 perlpunk

wondering if this is gonna be merged?

fernandezcuesta avatar Aug 29 '23 13:08 fernandezcuesta

I definitely want this in v.next, but especially if that ends up being a major release (which I assume it will be if we include 1.2 support and some form of declarative Loader/Dumper config to light it up ala #700), I don't see why we wouldn't just turn it on by default... Document it as a "sorta breaking change that actually brings us into better spec compliance that probably nobody will hit on purpose", and maybe don't even provide a knob to shut it off (or if we do, implement it in the Loader config, not on any of the top-level APIs).

Thoughts?

nitzmahone avatar Aug 30 '23 15:08 nitzmahone