dolt icon indicating copy to clipboard operation
dolt copied to clipboard

JSON ordering is not well-ordered.

Open nicktobey opened this issue 8 months ago • 1 comments

The MySQL spec requires that JSON objects have an ordering, but this ordering is implementation defined.

Dolt has a comparison function for JSON objects, but it's not well ordered.

json> SET @ROCK     = JSON_OBJECT('a', 2e0, 'b', 1e0);
json> SET @PAPER    = JSON_OBJECT('b', 2e0, 'c', 1e0);
json> SET @SCISSORS = JSON_OBJECT('c', 2e0, 'a', 1e0);

json> SELECT CAST(@ROCK as JSON) < CAST(@PAPER as JSON);
+--------------------------------------------+
| CAST(@ROCK as JSON) < CAST(@PAPER as JSON) |
+--------------------------------------------+
| true                                       |
+--------------------------------------------+
1 row in set (0.00 sec)

json> SELECT CAST(@PAPER as JSON) < CAST(@SCISSORS as JSON);
+------------------------------------------------+
| CAST(@PAPER as JSON) < CAST(@SCISSORS as JSON) |
+------------------------------------------------+
| true                                           |
+------------------------------------------------+
1 row in set (0.00 sec)

json> SELECT CAST(@SCISSORS as JSON) < CAST(@ROCK as JSON);
+-----------------------------------------------+
| CAST(@SCISSORS as JSON) < CAST(@ROCK as JSON) |
+-----------------------------------------------+
| true                                          |
+-----------------------------------------------+

(The casts are used here because MySQL doesn't actually permit JSON local variables, so the local variables are strings that need to be re-cast. The values are written as 2e0 to ensure they are parsed as floats, not ints or decimals.)

nicktobey avatar Dec 19 '23 23:12 nicktobey

The current comparison algorithm looks like this:

  1. Find all keys shared by both objects and sort them lexographically.
  2. Compare the values for these keys in sorted order.
  3. If all shared keys are equal, then the object with more total keys is greater.
  4. If both objects have the same number of keys, take the first key (lexographically) in each object that isn't in the other objects and compare the keys (not the values).

The algorithm is going to need to be changed anyway because it currently doesn't handle types that aren't in standard JSON (like ints and decimals) even though MySQL supports all types as JSON values. So I might as well fix this while I'm in there.

nicktobey avatar Dec 20 '23 00:12 nicktobey