Son
Son copied to clipboard
Better ordering of keys in objects
The order of keys should be enforced if you want to ensure a deterministic output. E.g. Sorted in lexicographical ascending order.
The specification currently states, "Object members must be sorted by lexicographic order of their keys." It's worth specifying ascending/descending here, though.
Also, on that note --- what are the restrictions on using Unicode in keys?
I went ahead and specified ascending order as that's a clear improvement: https://github.com/seagreen/Son/commit/42e4294ee85af2a7c22450253b36347d981b91fa
Also, on that note --- what are the restrictions on using Unicode in keys?
Keys are Son Strings, so they're defined by: https://github.com/seagreen/Son/blob/master/son.ebnf#L19
I can actually see two ways of ordering strings: by using the actual codepoints that make up a Son string, or by unescaping them and using the unescaped codepoints (eg by replacing \n
with the newline character and then ordering). Not sure which would be best.
EDIT: @chmike I missed that you mentioned the ordering needs to specify "ascending" too. Thanks for catching that!
I can actually see two ways of ordering strings
There is also normalization to take into account. The most appropriate ordering (or even equivalence!) for strings varies case by case, so to speak.
ETA: You might also find this expired I-D useful.
@tonyg: Thanks for mentioning JSON Canonical Form, I've opened an issue for finding other subset of JSON here: https://github.com/seagreen/Son/issues/7
Quoting from hn:
Right now the reference implementation sorts by comparing codepoints one-on-by one. When it reaches a codepoint that's unequal or nonexistant it orders the string with the lesser or nonexistant codepoint first. [...] Two remaining questions:
- What's the most unambiguous way to describe this process?
- Right now the comparison is on unescaped strings. Should it be on escaped strings instead?
@chmike and @etherealvisage: I added a thanks section: https://github.com/seagreen/Son#special-thanks, let me know if you want your links to go somewhere other than your github profiles!
Why does the sort order have to be lexographic?
If the goal is to ensure that there is only a single valid ordering for a given set of keys, an ordinal sort achieves the same thing but is much faster to compute.
Also, isn't a lexographic ordering potentially culture/localization specific?