graphql-engine
graphql-engine copied to clipboard
JSON key ordering is lost when inserting JSON values
Version Information
Server Version: 2.5.1
Environment
Cloud / OSS
What is the expected behaviour?
Inserting JSON values into JSON or JSONB columns should insert the data as-is, with no changes to the data being inserted.
Keywords
JSON JSONB parsing
What is the current behaviour?
Inserting JSON values into JSON or JSONB columns on postgres will re-order the object keys in unpredictable ways. There is no clear pattern to the changes, and with some payloads JSON and JSONB columns are re-ordered differently.
How to reproduce the issue?
Due to the unpredictable nature of the issue, I am unsure if this will replicate consistently on every machine. If the below does not work, I suggest you try again with increasingly complex JSON payloads.
- Create and track the following table:
CREATE TABLE json_test (
id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
data JSON,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
- Execute the following mutation:
mutation InsertJsonTest ($data: json!) {
insert_json_test_one (object: {
data: $data
}) {
data
}
}
variables:
{
"data": { "b": "value", "a": "value" }
}
- The mutation returns the following payload:
{
"data": {
"insert_json_test_one": {
"data": {
"a": "value",
"b": "value"
}
}
}
}
You can see the key order has changed. This is also apparent if you check the database directly, the re-ordering happens during the insert.
My best guess is that because JSON payloads are passed as parts of the variables object, they get parsed with the object all the way down, then stringified again, and something goes wrong during this process.
Any possible solutions?
Currently there are no issues with queries, only mutations. If ordering of keys is critical, users would need to resort to performing inserts themselves using a third party postgres client.
Most likely part of the answer here will be, from the JSON spec:
An object is an unordered set of name/value pairs.
Therefore it could be argued that JSON key ordering is inconsequential.
However, hasura parsing the JSON values is an implementation detail of hasura. It could very well instead accept JSON payloads as encoded JSON strings (which IMHO should be the default), but it fails to escape said strings and instead inserts them as-is into the database.
This may be worth opening a separate issue on
One note also is that JSONB in postgres is inherently unordered.
c.f. https://www.postgresql.org/docs/current/datatype-json.html:
Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.
@BenoitRanque Do you think the JSON should be parsed? I mean, lots of other values also get modified slightly as they pass the Hasura pipeline. For instance, numbers passed as strings are not inserted as strings but as syntactically numeric values. So I don't think Hasura sets an expectation of passing values through to the DB unchanged.
@BenoitRanque Was this introduced as a bug in 2.5.1? As in did this work correctly before 2.5.1?
@codedmart I'm actually unsure, apologies. This may or may not have been a regression. There's an associated ticket: 2363, but the version is not recorded there, nor is it mentioned that this ever worked.
@abooij I think parsing the JSON is inevitable. By design, a valid hasura query will be JSON and thus include at least a part that must be parsed, and I don't know if we can partially parse a JSON blob.
Given the below query:
{
"query":"mutation InsertJsonTest ($object: json_test_insert_input!) { insert_json_test_one (object: $object) { data } }",
"variables": {
"object": {
"data": { "b": "value", "a": "value" }
}
}
}
The value of the "data" key can be any arbitrary JSON, including invalid JSON that postgres could feasibly still store without issues.
I don't think we can avoid parsing the value of that key? I mean I'm sure we can but doubtful we want to.