yq icon indicating copy to clipboard operation
yq copied to clipboard

Fix merge anchor traversing

Open stevenwdv opened this issue 5 months ago • 1 comments

  • Allow inline maps instead of just aliases (e.g. {<<: {a: 42}}, {<<: [{a: 42}]})
  • Disallow nested sequences (e.g. ~~{<<: [[{a: 42}]]}~~)
  • Disallow other types (e.g. ~~{<<: 42}~~)

Closes #2386

I have a question though: I see in operator_traverse_path_test.go that goccy would reject at least some invalid merge types at parse time, which might make using custom tags like <<: !include file.yaml (see #2386) impossible. Am I right about that? And does that mean that this usage will not be supported in the future? If so, that would be a bit sad.

stevenwdv avatar Jun 13 '25 19:06 stevenwdv

Wait, this actually doesn't fix explode(.) yet. Converting to draft for now, I'll do that later.

stevenwdv avatar Jun 13 '25 19:06 stevenwdv

EDIT: nvm fixed it


Ok, the excessive exploding fix I mentioned above may actually cause problems of its own: echo '{a: {b: &b 42}, <<: {c: *b}}' | yq 'explode(.)' now returns {a: {b: 42}, c: *b}, which also has a dangling alias...

However, I don't really know how to solve this, because it depends on if the mapping node to which the merge anchor refers is also being exploded, e.g. with {r: &r {...}, m: {<<: *r}} we don't want to explode .r, although we might make a copy and then explode it.

And apparently this issue is already present when partially exploding: echo '{a: {b: &b 42}, c: *b}' | yq 'explode(.a)' in the latest release v4.45.4 returns {a: {b: 42}, c: *b}, which also has a dangling alias. Is that a bug or intentional?

stevenwdv avatar Jun 16 '25 13:06 stevenwdv

Minor comment, need more time to review this and consider if there are any side effects - thanks for the contribution though!

mikefarah avatar Jul 10 '25 01:07 mikefarah

This is really good - I think you have a clearer picture of how merge works better than I do 😅 - I'm going to go over it again later just to double check it as it's a sensitive part of the code.

mikefarah avatar Jul 16 '25 23:07 mikefarah

Thanks for having another look! Did you see my comments in the thread above as well? Currently I've actually broken key overriding in regular maps for traversing, and I don't have a good solution yet...

stevenwdv avatar Jul 17 '25 09:07 stevenwdv

Ah spelling is also wrong I see, didn't know this was a bri'ish codebase ;)

Edit: apparently the codebase it not entirely consistent, but anyway

stevenwdv avatar Jul 17 '25 09:07 stevenwdv

Ok I think to properly fix explode I need to take a step further and rewrite the reconstructAliasedMap - which I've done in a branch MakeExplodeGreatAgain and included your extra tests 👍🏼 I've also changed the warning to just be a generic you haven't set the flag warning, rather than trying to specify which maps are affected.

Tempted to merge it into master - it'll no doubt conflict with what you've got here though :/

mikefarah avatar Jul 19 '25 05:07 mikefarah

Ok I think to properly fix explode I need to take a step further and rewrite the reconstructAliasedMap - which I've done in a branch MakeExplodeGreatAgain

Nice, I'll merge it in here as well

stevenwdv avatar Jul 20 '25 11:07 stevenwdv

@mikefarah I notice that fixedReconstructAliasedMap in MakeExplodeGreatAgain is lacking a lot of logic that is present in reconstructAliasedMap. Compare:

$ echo '{e: &empty {}, b: &b 42, a: {b2: *b}, <<: *empty}' | ./yq 'explode(.)'
[WARNING] 14:18:06 --yaml-fix-merge-anchor-to-spec is false; causing merge anchors to override the existing values which isn't to the yaml spec. This flag will default to true in late 2025.
{e: {}, b: 42, a: {b2: 42}}

vs

$ echo '{e: &empty {}, b: &b 42, a: {b2: *b}, <<: *empty}' | ./yq --yaml-fix-merge-anchor-to-spec 'explode(.)'
{e: &empty {}, b: &b 42, a: {b2: *b}}

This is because it's not using overrideEntry. It's also not using Context, but I'm actually not sure what the exact purpose of that is.

This may also break aliases (*b is invalid now):

$ echo '{b: &b 42, m: {e: &empty {}, a: {b2: *b}, <<: *empty}}' | ./yq --yaml-fix-merge-anchor-to-spec 'explode(.)'
{b: 42, m: {e: &empty {}, a: {b2: *b}}}

I'm seeing if I can quickly fix it in the merge.

stevenwdv avatar Jul 20 '25 12:07 stevenwdv

I'm seeing if I can quickly fix it in the merge.

I thought of a way to quickly solve the issue in a different way, by evaluating merge keys first, I hope you don't mind.

I'm now looking at the issue in my code I mentioned above.

stevenwdv avatar Jul 20 '25 13:07 stevenwdv

Ok, it should be all done now, I hope! Well, maybe the tests could be cleaned up a bit / made more consistent between traverse & explode.

Btw, I thought of some interesting edge cases, where I don't know what the intended way to handle them would be, which again is probably one of the reasons merge anchors are not part of the recent spec anymore: the more I know about them, the less sense it makes to me to use regular keys for behavior like this.

(current behavior in bold)

  • {<<: {a: 42}, <<: {b: 42}}: {a: 42, b: 42}/{b: 42}? Arguably it should be the latter, since after de-duplication only the second merge key remains
  • {<<: {a: 42}, !!str <<: {b: 42}}: {a: 42, !!str <<: {b: 42}}/{!!str <<: {b: 42}}? I think this is right, since the keys have different types
  • {<<: {a: 42}, !!merge mykey: {a: 37, b: 42}}: {a: 42, b: 42}/{a: 37, b: 42}/invalid?

Not that other YAML tools handle them in a good way; yamllint.com with input {<<: {a: 42}, !!str <<: {b: 42}} gives {a: 42, b: 42} (as did yq previously; with explode, not traverse).

stevenwdv avatar Jul 20 '25 13:07 stevenwdv

Ah the reason I was processing like that was to preserve key order on merging. Not a big issue though, this is looking good! Thanks for PR!

mikefarah avatar Jul 21 '25 23:07 mikefarah

Hmm - only problem with the way you're processing explode is that they key order is mangled if merge is processed first :/ Not a massive issues I guess...though I might see if I can fix that

mikefarah avatar Jul 22 '25 00:07 mikefarah

True, although YAML does act like merge keys come first, so it kinda makes sense in a way

stevenwdv avatar Jul 22 '25 08:07 stevenwdv

Hey @mikefarah, I'm really confused because it seems like something went really wrong merging this; a lot of changes didn't make it and some logic from MakeExplodeGreatAgain that I changed it in this MR was merged. E.g.

  • fixedReconstructAliasedMap re-appeared
  • !!str << is broken again
  • Inline merge anchors are broken without --yaml-fix-merge-anchor-to-spec while I fixed those for both cases
  • Nested merge anchors ({<<: {<<: {a: 42}}}) are broken even with the fix
  • Probably more

stevenwdv avatar Jul 25 '25 16:07 stevenwdv

Yeah, I brought those changes back in to ensure the correct map key order. Afaik I included all the tests you added; so there's shouldn't be any gaps :/ - but maybe something went awry for those cases.

Inline merging is under the flag, that's intentional - my thinking is that it's best to leave the old way of merging as untouched as possible seeing the new way is a complete re-write.

Not sure what !!str << exactly refers to? Will take a look at nested inline merge anchors when I get a chance - though it seems pretty niche; not sure what the use-case would be?

mikefarah avatar Aug 04 '25 01:08 mikefarah

@mikefarah I'm a bit sad to see many things that worked with this MR not work anymore. I put in a lot of work to try and get everything right, only to see most of it break again. Sorry if that sounds to harsh, I'm still really glad you're maintaining this useful tool and actively looking at issues and comments. Anyway, more specifically, it still doesn't work anymore for what I was trying to do originally, also with the flag. (Specifically because of nested inline merge anchors.)

I think that aside from the order, which I understand you might want to keep the same but shouldn't impact the meaning, I made sure to keep everything compatible with previous behavior with and without the flag. I also think that I kept the order the same as before when the flag was not specified (although I'm not 100% sure of the top of my head); maybe you can check this, and in that case, can't the other fixes just be included with and without the flag? I don't think those could break previous code.

Regarding behavior with the flag, fixedReconstructAliasedMap bypasses all original fixes from this MR, so it doesn't handle inline maps in merge anchors, and has some other shortcomings.

I think that not all tests are ran for both with & without the flag anymore where they could be. Maybe that's why it seems to be working fine.

!!str << is not a merge key, it's just a regular key that happens to be called <<.

I could make a PR to fix the things that are broken now and add tests for those, but the thing is; this was that PR :/

stevenwdv avatar Aug 04 '25 18:08 stevenwdv