strictyaml icon indicating copy to clipboard operation
strictyaml copied to clipboard

Preserve order of keys on load and dump

Open shoogle opened this issue 6 years ago • 2 comments

Ordinary YAML does not mandate any particular ordering of keys. This means that when a user creates a YAML file, such as this:

key1: First value
key2: Second value
key3: Third value

It is perfectly valid for a YAML dumper to spit out this:

key2: Second value
key1: First value
key3: Third value

Not only does it make a large file more difficult to read and understand, it also creates a false diff:

+key2: Second value
 key1: First value
-key2: Second value
 key3: Third value

It is easy to see how this could obscure a genuine change. Also, if the file is under version control then it will increase the size of a code repository for no good reason.

Possibly the strictYAML parser already preseves ordering (I don't know, I haven't tried it yet). However, my argument is that this should be more than just an implementation detail. I believe that this should be an actual feature of the strictYAML specification, either as a MUST or at least STRONGLY RECOMMENED, and brought to the attention of implementors with a suitable example and justification.

P.S. Thank you for inventing StrictYAML!

shoogle avatar Jul 20 '19 21:07 shoogle

It may interest you to learn that PyYAML, the official YAML parser, now provides an option to preserve key order during loading and dumping. The issue was discussed in yaml/pyyaml#110, where I gave the following argument in favour of preservation:

Since the [standard YAML] spec doesn't guarantee an order, that means any order is valid. PyYAML could return dict keys in any arbitrary order [...] and it would still be perfectly consistent with the YAML specification.

In practice, the only ordering that makes any sense is the order in which [the keys] were created, because if they are returned in a different order then the information about which was created first is lost forever. If the user requires any other form of ordering (alphabetical, etc.), then he/she is able to sort the dict themself after it has been returned in creation order. However, if the dict is not returned in creation order then the user can never put it back in creation order (except by a lucky guess).

Even the original YAML specification conceeds that dumpers have to choose an ordering, it just says that people shouldn't rely on them choosing the same ordering. However, it creates problems if dumpers choose different orderings, so it ends up being better just to mandate one order as being correct, and the only ordering that it makes sense to use is the order that the user has already chosen.

shoogle avatar Jul 20 '19 22:07 shoogle

Hi Peter,

StrictYAML does not yet have an official written spec, but if it does, explicit key ordering is going to be mandated - for deterministic roundtripping if no other reason. Currently that's how the library functions (or should, it's a bug if not).

Thanks for raising this issue. I've experienced these issues before but it's always good to have external validation.

On Sat, 20 Jul 2019, 22:42 Peter Jonas, [email protected] wrote:

Ordinary YAML does not mandate any particular ordering of keys https://yaml.org/spec/1.2/spec.html#id2765608. This means that when a user creates a YAML file, such as this:

key1: First valuekey2: Second valuekey3: Third value

It is perfectly valid for a YAML dumper to spit out this:

key2: Second valuekey1: First valuekey3: Third value

Not only does it make a large file more difficult to read and understand, it also creates a false diff:

+key2: Second value key1: First value-key2: Second value key3: Third value

It is easy to see how this could obscure a genuine change. Also, if the file is under version control then it will increase the size of a code repository for no good reason.

Possibly the strictYAML parser preseves ordering (I don't know, I haven't tried it yet). However, my argument is that this should be more than just an implementation detail. I believe that this should be an actual feature of the strictYAML specification, either as a MUST or at least STRONGLY RECOMMENED, and brought to the attention of implementors with a suitable example and justification.

P.S. Thank you for inventing StrictYAML!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/68?email_source=notifications&email_token=ABOJKNI2USF7QDVBPKOTTTDQAOBEXA5CNFSM4IFP2MP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HAOSMYA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOJKNKJCPZVQ2ID7H4S4ETQAOBEXANCNFSM4IFP2MPQ .

crdoconnor avatar Jul 22 '19 12:07 crdoconnor