Son
Son copied to clipboard
Lexical order needs better specification
The specification currently says
Object members must be sorted by ascending lexicographic order of their keys.
but does not mention if that's purely by code point value or whether and if some collation is applied. Different languages have different lexical orders. To give you some examples (taken from Sorting and Collations)
- English:
bailey
,boffey
,böhm
,brown
- German:
bailey
,boffey
,böhm
,brown
- German phonebook:
bailey
,böhm
,boffey
,brown
- Swedish:
bailey
,boffey
,brown
,böhm
Thanks for bringing this up, this is what I get for using a word I don't understand. This part (from your link) is why I used lexicographic:
The preceding search request would return the documents in this order: BROWN, Boffey, bailey. This is known as lexicographical order as opposed to alphabetical order. Essentially, the bytes used to represent capital letters have a lower value than the bytes used to represent lowercase letters, and so the names are sorted with the lowest bytes first.
The intention is for the sort to be done by size of Unicode code point number, starting with the first and moving to the next code point if there's a tie (and repeating as often as necessary). Do you know the word to describe this?
I don't know the exact word for this, most databases seem to offer something called "binary collation". I don't know if this is an official thing. But if the specification says that ordering is done by code point value then that's good enough for me.