Alias follows anchor restriction is surprising
The spec explicitly disallows:
&foo *bar
This probably does not happen when YAML is generated, but it is surprising when manually editing YAML.
For instance
- &foo a # OK
- *a # OK
- &foo *b # not OK
For a human editor, the fact that dropping an anchor before *b makes it malformed is very surprising.
For a (C/C++) programmer, not beeing able to use something that hasn't been declared comes quiet naturally. Also, YAML appears to be constructed in such a way, that everything should be one-pass processable. This gives great opportunities for speed optimizations and/or memory reduction: All nodes can be evicted from memory as soon as the walk (depth-first and/or along the YAML text) goes to a sibling or recedes to a parent node -- unless the node had an anchor. If you allow forward references, e.g. alias some node that is anchored later on, you don't know the content at the moment you stumble upon the alias. You have to come back to that part of the tree. This complicates algorithms a lot. Also, if I understood correctly, anchors can be redefined later on, and any alias thereafter uses the most-recent anchor of that identifier. If you were to allow reversed order, than which of a nested, a sibling's or a textually later anchor would you reference? (This can be solved, but it isn't necessary.)
In practice, if you dislike going down into every detail in the place where you need a node for the first time, there is a solution: Define it somewhere (earlier), where it is more convenient, with an anchor, and then alias it in the place, where you want to use it (without too many details). I've hand-written YAML structures of 2MiB (well, with some generated lists, of course) in that manner, that are still very much understandable for humans, and still processed by rather simple tools.
I think, this reduces the number of possible anchors per node to one. Having many synonyms for the same thing makes language interesting, but also more complex.
I think the
For a (C/C++) programmer, not beeing able to use something that hasn't been declared comes quiet naturally. Also, YAML appears to be constructed in such a way, that everything should be one-pass processable. This gives great opportunities for speed optimizations and/or memory reduction: All nodes can be evicted from memory as soon as the walk (depth-first and/or along the YAML text) goes to a sibling or recedes to a parent node -- unless the node had an anchor. If you allow forward references, e.g. alias some node that is anchored later on, you don't know the content at the moment you stumble upon the alias. You have to come back to that part of the tree. This complicates algorithms a lot. Also, if I understood correctly, anchors can be redefined later on, and any alias thereafter uses the most-recent anchor of that identifier. If you were to allow reversed order, than which of a nested, a sibling's or a textually later anchor would you reference? (This can be solved, but it isn't necessary.)
In practice, if you dislike going down into every detail in the place where you need a node for the first time, there is a solution: Define it somewhere (earlier), where it is more convenient, with an anchor, and then alias it in the place, where you want to use it (without too many details). I've hand-written YAML structures of 2MiB (well, with some generated lists, of course) in that manner, that are still very much understandable for humans, and still processed by rather simple tools.
Well, I don't quite agree at the bit of a C/C++ programmer not being able to use what hasn't been declared before.
Take for instance this (quite common) case of the declaration of two structures each having a pointer to it's other:
struct bar;
struct foo {
struct bar *ptr;
};
struct bar {
struct foo *ptr;
};
This is the same as with the YAML anchors and alias. And it makes absolute sense if you think of an alias as a pointer and not expecting it to be immediately dereferrenced to it's content.
I concede the point about this not working in streaming mode, but then again I wouldn't expect alias resolution to work in streaming mode either, since that would require keeping in memory all the nodes that happen to have an alias anyway.
Concluding, this argument is about what an alias is really, is it a value derived by the alias it's referring to, or is it a pointer to an alias. I favor the latter.
@pantoniou I think we're mixing-up "declared" and "defined" in the C language family.
Let me comment on your example:
struct bar; /* forward-declaration of `bar` */
struct foo { /* definition of `foo` */
struct bar *ptr; /* Using `bar` is possible in pointers (and references in C++), but its content is inaccessible. Also, `sizeof(bar)` isn't available yet. */
};
struct bar { /* definition of `bar` */
struct foo *ptr;
struct foo my_foo; /* Possible here because `foo` has already been defined. */
char my_foo_buffer[ sizeof(foo) ]; /* Works. (Beware: unaligned storage!) */
char my_bar_buffer[ sizeof(bar) ]; /* Error: definition of `bar` is not complete until the closing `}`. */
};
I think that an anchor marks the definition of its content. There is no such thing as a (forward) declaration.
You are, of course, right in the aspect, that any anchored content will have to be retained ("anchored") in memory until the end of the document / stream: one can never guess the possible aliases, until they come, or until they get overwritten. Still, anchor/aliases might be the better option when compared to storing and parsing the same content all over, possibly many times.
With the prospect of relative aliases by path, it seems your concern might be mitigated. Your example structure could be translated into relative references, even without the forward declaration:
foo:
ptr: *../../bar
bar:
ptr: *../../foo
Look at https://github.com/yaml/yaml-spec/issues/44#issuecomment-932736889 by @ingydotnet.
(Side question: I wonder if it ought to be *../../bar or *../bar. Both are sensible IMO.)
Just for comparison, this is what I imagine a possible serialization of the above model in current YAML 1.2 spec:
foo: &1
ptr: &2
ptr: *1
bar: *2
I find that counter-intuitive as well.