Son icon indicating copy to clipboard operation
Son copied to clipboard

Better ordering of keys in objects

Open chmike opened this issue 7 years ago • 7 comments

The order of keys should be enforced if you want to ensure a deterministic output. E.g. Sorted in lexicographical ascending order.

chmike avatar Mar 14 '17 18:03 chmike

The specification currently states, "Object members must be sorted by lexicographic order of their keys." It's worth specifying ascending/descending here, though.

Also, on that note --- what are the restrictions on using Unicode in keys?

etherealvisage avatar Mar 14 '17 18:03 etherealvisage

I went ahead and specified ascending order as that's a clear improvement: https://github.com/seagreen/Son/commit/42e4294ee85af2a7c22450253b36347d981b91fa

Also, on that note --- what are the restrictions on using Unicode in keys?

Keys are Son Strings, so they're defined by: https://github.com/seagreen/Son/blob/master/son.ebnf#L19

I can actually see two ways of ordering strings: by using the actual codepoints that make up a Son string, or by unescaping them and using the unescaped codepoints (eg by replacing \n with the newline character and then ordering). Not sure which would be best.

EDIT: @chmike I missed that you mentioned the ordering needs to specify "ascending" too. Thanks for catching that!

seagreen avatar Mar 14 '17 19:03 seagreen

I can actually see two ways of ordering strings

There is also normalization to take into account. The most appropriate ordering (or even equivalence!) for strings varies case by case, so to speak.

ETA: You might also find this expired I-D useful.

tonyg avatar Mar 14 '17 19:03 tonyg

@tonyg: Thanks for mentioning JSON Canonical Form, I've opened an issue for finding other subset of JSON here: https://github.com/seagreen/Son/issues/7

seagreen avatar Mar 14 '17 21:03 seagreen

Quoting from hn:

Right now the reference implementation sorts by comparing codepoints one-on-by one. When it reaches a codepoint that's unequal or nonexistant it orders the string with the lesser or nonexistant codepoint first. [...] Two remaining questions:

  1. What's the most unambiguous way to describe this process?
  2. Right now the comparison is on unescaped strings. Should it be on escaped strings instead?

seagreen avatar Mar 14 '17 22:03 seagreen

@chmike and @etherealvisage: I added a thanks section: https://github.com/seagreen/Son#special-thanks, let me know if you want your links to go somewhere other than your github profiles!

seagreen avatar Mar 17 '17 16:03 seagreen

Why does the sort order have to be lexographic?

If the goal is to ensure that there is only a single valid ordering for a given set of keys, an ordinal sort achieves the same thing but is much faster to compute.

Also, isn't a lexographic ordering potentially culture/localization specific?

mwerezak avatar Dec 10 '17 13:12 mwerezak