strictyaml icon indicating copy to clipboard operation
strictyaml copied to clipboard

Feature Request: Support for Anchors and Refs

Open jhereth opened this issue 5 years ago • 8 comments

First of all: Thanks for the work on strictyaml. Originally, I found it via the features removed page (which also saved me from other bad surprises since).

Most features that are removed in strictyaml can be handled better with schemas. One exception (for me) is the removal of anchors and references.

Looking at the proposed syntax in the why? page it doesn't look much different than the yaml code in the beginning. However, removing this feature puts the burden of implementing these feature to the app developers.

I found anchors and refs useful to simplify yamls for many applications. I'm happy that these features are available out of the box.

I do agree that yamls can become hard to read when misusing the syntax. However, losing this feature completely seems like a big loss.

Would you consider to re-add it? Or enable it optionally?

jhereth avatar Aug 12 '20 12:08 jhereth

I'm open to examples that might disprove my hypothesis but every example I've seen where they have been used made me retch a little and think "this feature would not have been useful if the application developer had developed a better schema".

For others, it's made me think that they should have exposed a code API rather than trying to invent a YAML DSL where a YAML DSL was not appropriate.

i.e. where I've seen it used, it's a hack to circumvent a problem that lies elsewhere.

That alone may not be enough to justify removing it though.

However, I also have a high level aim of ensuring that strictyaml documents can be understood by non programmers with minimal training and I think this feature would sacrifice that aim.

WDYT?

On Wed, 12 Aug 2020, 13:57 qudade, [email protected] wrote:

First of all: Thanks for the work on strictyaml. Originally, I found it via the features removed https://hitchdev.com/strictyaml/features-removed/ page (which also saved me from other bad surprises since).

Most features that are removed in strictyaml can be handled better with schemas. One exception (for me) is the removal of anchors and references.

Looking at the proposed syntax in the why? page https://hitchdev.com/strictyaml/why/node-anchors-and-references-removed/ it doesn't look much different than the yaml code in the beginning. However, removing this feature puts the burden of implementing these feature to the app developers.

I found anchors and refs useful to simplify yamls for many applications. I'm happy that these features are available out of the box.

I do agree that yamls can become hard to read when misusing the syntax. However, losing this feature completely seems like a big loss.

Would you consider to re-add it? Or enable it optionally?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/123, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOJKNJOKV5QWUJ475KDUZ3SAKGURANCNFSM4P4U3LDQ .

crdoconnor avatar Aug 12 '20 13:08 crdoconnor

The main use for this is to keep the yaml code DRY. While this acronym is probably more familiar to programmers, the advantages are quite obvious to non-programmers, too.

A version of the yaml in the why example could be

__step_definitions:
  large: &large
    instrument:      Lasik 2000
    pulseEnergy:     5.4
    pulseDuration:   12
    repetition:      1000
    spotSize:        1mm
  medium: &medium
    instrument:      Lasik 2000
    pulseEnergy:     5.0
    pulseDuration:   10
    repetition:      500
    spotSize:        2mm
steps:
- step: *large
- step: *medium
- step: *large
- step: *medium
- step:
    <<: *large
    spotSize: 2mm
- step: *large

While this is similarly readable as the proposed solution it takes away the effort to implement the handling of templates (with partial replacements) from the application developer.

For instance, I was happy to use anchors and references for my mail filters (using gmail-yaml-filters). It's great that mesozoic didn't have to predict my personal usage pattern and still I can keep my rules DRY.

I appreciate your focus on the user perspective. Anchors and Refs can be confusing (especially when the templates are not put in an extra place). Used cautiously they can be used and understood by non-programmers (I think). To disable them however requires the app developers to (repeatedly) implement the same/similar logic for every app.

This feature has the advantage to work the same across all yaml-based applications - the user has to learn it only once. Templating mechanisms implemented per application will likely have different features/keywords/... and the user has to relearn and adapt with every app.

Some yaml files making excessive use of this feature are quite hard to read. I could see an argument to disable it on request for an application. To generally remove it however might do more harm than good.

jhereth avatar Aug 13 '20 19:08 jhereth

To disable them however requires the app developers to (repeatedly) implement the same/similar logic for every app.

I think this is, broadly speaking, a good thing. If your users find that the YAML often gets repetitive it's a good idea to think about how you could refactor it to make it less repetitive. This will make your schema easier to understand and easier to read and ultimately make your application easier to use. I've been through this process a few times and I've always been glad I did.

As with writing code, I'm pretty sure that facilitating deduplication is a task that is best done retrospectively and to "let" some duplication happen first before doing it. Duplication is cheaper than the wrong abstraction and all that.

For instance, I was happy to use anchors and references for my mail filters (using gmail-yaml-filters). It's great that mesozoic didn't have to predict my personal usage pattern and still I can keep my rules DRY.

I don't think he should have had to anticipate your personal usage pattern either, but I think it would be a good thing to share your usage pattern to let him redesign the schema to accommodate it in a clean way without needing to use anchors or refs.

I also suspect that your personal usage pattern with this app maybe isn't as unique as you'd think and other users may also benefit from a redesign that helps you. If I maintained that app I'd be thrilled to get feedback like that.

This feature has the advantage to work the same across all yaml-based applications - the user has to learn it only once. Templating mechanisms implemented per application will likely have different features/keywords/... and the user has to relearn and adapt with every app.

I feel very similarly about templating to deduplicate or parameterize YAML. It's a nasty hack that indicates either the need for a schema redesign (or that YAML shouldn't be used at all). I hate all this kubernetes templating stuff and ironically enough I don't think CI pipelines should really be built in YAML :)

Obviously I can't stop people doing this with StrictYAML but I do discourage it, just as I discourage anchors and refs.

I do think that the solution of "change schema to accomodate 'profiles'" while maintaining backwards compatibility (as exhibited by the lasik example i gave) is a good pattern to solve problem and results in a neater, DRYer, easier to understand YAML document. I've tried the templating pattern instead a few times and always regretted it.

Some yaml files making excessive use of this feature are quite hard to read. I could see an argument to disable it on request for an application.

It's a hack that acts a salve for an insufficiently expressive schema design. Hacks have a tendency to be ok if used lightly, but they have a tendency to grow imperceptibly until they spawn an almighty mess. The reason I'm wary is because I'd like the path of least resistance for application developers to be "modify schema" not "tell users to just use anchors and refs".

Philosophically I see this feature as a violation of the rule of least power. This feature doesn't make YAML documents turing complete but it edges it towards it, and I've never seen an example where there isn't a better way.

crdoconnor avatar Aug 14 '20 14:08 crdoconnor

Personally, I recommend that the spec be left as-is. But as a secondary question, could items like Anchors and Refs be allowed in the library as an optional feature?

I'm not saying it should, but by making it explicit and optional you are allowing library users to operate outside of the spec if they want. Alternatively, you could allow a schema to implement Anchors/Refs indirectly with the correct optional hooks.

Myself, I've had to face this with a project I'm writing in Nim. Since Nim has no StrictYAML library, I'm writing one. I will make it public in a week or so. By default, it will follow the spec exactly. But I'm also adding optional support for nulls and empty arrays/objects; because the project I'm working simply requires it.

If curious, the details:

  • nulls are supported with the string null. Whereas "null" is a four-character string. So the de-serializing parser must distinguish between the word null in quotes and the word null not in quotes.
  • empty objects and arrays are noted with {} and []. But general JSONification is not supported. Anything other than unquoted {} and [] is a string. (Having just typed this: if there is a way to put an empty array or object in StrictYAML, please let me know and I'll do that instead. Using a null does not work since I need to distinguish between an array with unknown content (aka a null), an empty array, and an empty string. My project will not be using a schema at that layer.)
  • on output, object fields are always sorted alphabetically when serialized. In fact, everything is normalized. So essentially, the same YAML document always generates the exact same output regardless of how it was written when input. My unit tests will verify the same MD5. "Loose in, strict out".

JohnAD avatar Aug 14 '20 18:08 JohnAD

@crdoconnor I agree that this feature is a violation of the rule of least power.

While developing a new application it's very helpful however until you can better understand the typical use cases of your users or to support atypical use cases (e.g. for the mail filter example: I've seen quite different patterns (written in XML unfortunately) and it would be a lot of additional effort for a tool like gmail-yaml-filters to support all of them).

What do you think of @JohnAD's proposal to allow them as an extension/optional feature? Matured StrictYAML applications wouldn't use it but evolving one could use it (in a consistent manner)?

jhereth avatar Aug 18 '20 07:08 jhereth

I agree that anchors should be an extension rather than supported by default.

The main use for this is to keep the yaml code DRY.

The problem happens when you have a file with anchors and refs, read it into Python, and then reserialise back to YAML. Now the anchors are gone and the references are replaced with the actual content and your DRYness is gone.

In other words, anchors and refs are not roundtripable if supported by default, but they can be roundtrippable if expanding them is made optional. I'd keep them out of the main spec to keep that simple, but it would be great to agree on a syntax that people should use if they want to implement anchors and refs themselves.

shoogle avatar Nov 28 '20 22:11 shoogle

There are good arguments for anchors & refs and against it (at least in strictyaml) and I would prefer this as an optional feature. How can valid Yaml (which could include anchors & refs) get parsed while anchors & refs treat as an error? Is there any possibility to get it accepted but not resolved? Maybe as an 'dirty_load'?

Nos- avatar Jul 28 '21 20:07 Nos-

I'm open to examples that might disprove my hypothesis but every example I've seen where they have been used made me retch a little and think "this feature would not have been useful if the application developer had developed a better schema".

For me it's exactly the other way around: In the example you give against references, I vastly prefer the variant with references to both the variant with duplication and the one with schema-imposed deduplication. Having a config format that allows you to define your own abstractions in a way that doesn't depend on changes to the compiled program is incredibly powerful & important, and also makes configs easier to understand, since you don't need to look at the code (/documentation) to figure out how to write something without unnecessary redundancy.

The issue with references not surviving load/save cycles is the only argument I agree with against references - and I don't think this issue is insurmountable. Keeping references in the loaded object, and allowing value reads in both dereferencing and "transparent" manner (the latter might return the label, or perhaps the path of the referenced object) would allow programs to fully handle references, with the only downside being more complexity when changing a YAML configuration via some other UI - the program would have to decide whether to change the referenced pattern, or replace it with a modified copy - and that downside would also exist in any implementation of deduplication with hardcoded schemas.

shadoxxhd avatar Apr 14 '24 23:04 shadoxxhd