Rust: spec how string keys are interpreted
In particular, keys could be canonicalized in some way before doing any comparisons, and key ranges could be specced in different ways.
I basically think that there should be no canonicalization, and ordering should be defined however std's default ordering is, and hopefully that is by code points.
Some of my rambling from slack:
There is _also_ the question of what the proper string ordering even _is_. We'll need to define that explicitly.
Since this whole project is based around utf-8 strings, and there are many ways to encode the same unicode strings and different ways to order them
We haven't specified whether the DB does any canonicalization of string keys or whether they are just treated internally as a byte vec
I think for now we can spec it all as simply treating keys as byte vecs, but a later project that gets into unicode would be very practical
actually, a scan operation that treats strings as byte vecs sounds pretty bogus to me, though maybe it "just works" in some sensible way in utf-8. I think the simple result we want is that keys are ordered by code point. (edited)
Though in practice, since the entire index is in memory, the ordering we are going to get is whatever std's default string ordering is. Now I'm curious how that comparison is implemented.
I'm putting this on the mvp, but it's probably ok to slip.
cc @mapleFU
Here's a string sorting experiment: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=99ec4fd4c0e544282aa7094b718f3ddb
They seem to be sorted by code point, then string length.