UPDATE statement with array/jsonb subscripting behaves incorrectly with more than one field
Steps to Reproduce
Consider the following script:
DROP TABLE IF EXISTS test;
CREATE TABLE test
(
id INTEGER,
data jsonb
);
SELECT create_distributed_table('test', 'id');
INSERT INTO test
VALUES (1, '{
"a": 1
}');
INSERT INTO test
VALUES (2, '{
"a": 2
}');
INSERT INTO test
VALUES (3, '{
"a": 3
}');
INSERT INTO test
VALUES (4, '{
"a": 4
}');
INSERT INTO test
VALUES (5, '{
"a": 5
}');
UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
, data['c'] = updated_vals.c::TEXT::jsonb
, data['d'] = updated_vals.d::TEXT::jsonb
FROM (
SELECT id,
data['a'] AS a,
data['a']::NUMERIC + 1 AS b,
data['a']::NUMERIC + 2 AS c,
data['a']::NUMERIC + 3 AS d
FROM test
) updated_vals
WHERE test.id = updated_vals.id;
SELECT *
FROM test;
Expected result
+--+--------------------------------+
|id|data |
+--+--------------------------------+
|1 |{"a": 1, "b": 2, "c": 3, "d": 4}|
|2 |{"a": 2, "b": 3, "c": 4, "d": 5}|
|3 |{"a": 3, "b": 4, "c": 5, "d": 6}|
|4 |{"a": 4, "b": 5, "c": 6, "d": 7}|
|5 |{"a": 5, "b": 6, "c": 7, "d": 8}|
+--+--------------------------------+
Actual result
+--+----------------+
|id|data |
+--+----------------+
|1 |{"a": 1, "d": 4}|
|5 |{"a": 5, "d": 8}|
|4 |{"a": 4, "d": 7}|
|3 |{"a": 3, "d": 6}|
|2 |{"a": 2, "d": 5}|
+--+----------------+
Environment
Postgresql: 14.1 Citus: 10.2.3
That's bad, we have tests, but only updates a single field hence we couldn't spot: https://github.com/citusdata/citus/pull/5232/files
Interestingly only the last SET is sent to workers:
UPDATE public.test_102868 test
SET DATA['d'::text] = ((updated_vals.d)::text)::JSONB
FROM
(SELECT test_1.id,
test_1.data['a'::text] AS a,
((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (1)::numeric) AS b,
((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (2)::numeric) AS c,
((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (3)::numeric) AS d
FROM public.test_102868 test_1) updated_vals
WHERE (test.id OPERATOR(pg_catalog.=) updated_vals.id)
Even if I force intermediate results, we get the same behavior:
UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
, data['c'] = updated_vals.c::TEXT::jsonb
, data['d'] = updated_vals.d::TEXT::jsonb
FROM (
SELECT id,
data['a'] AS a,
data['a']::NUMERIC + 1 AS b,
data['a']::NUMERIC + 2 AS c,
data['a']::NUMERIC + 3 AS d
FROM test OFFSET 0
) updated_vals
WHERE test.id = updated_vals.id;
Probably a deparser issue because bare deparser produce wrong result:
SELECT deparse_shard_query_test($$
UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
, data['c'] = updated_vals.c::TEXT::jsonb
, data['d'] = updated_vals.d::TEXT::jsonb
FROM (
SELECT id,
data['a'] AS a,
data['a']::NUMERIC + 1 AS b,
data['a']::NUMERIC + 2 AS c,
data['a']::NUMERIC + 3 AS d
FROM test
) updated_vals
WHERE test.id = updated_vals.id;
$$);
INFO: query: UPDATE public.test SET data['d'::text] = ((updated_vals.d)::text)::jsonb FROM (SELECT test_1.id, test_1.data['a'::text] AS a, ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (1)::numeric) AS b, ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (2)::numeric) AS c, ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (3)::numeric) AS d FROM public.test test_1) updated_vals WHERE (test.id OPERATOR(pg_catalog.=) updated_vals.id)
So, the RULE (e.g., what all deparser exists in PG) works fine:
CREATE TABLE test
(
id INTEGER,
data jsonb
);
CREATE TABLE test_2
(
id INTEGER,
data jsonb
);
CREATE RULE r3 AS ON UPDATE TO test
DO INSTEAD
UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
, data['c'] = updated_vals.c::TEXT::jsonb
, data['d'] = updated_vals.d::TEXT::jsonb
FROM (
SELECT id,
data['a'] AS a,
data['a']::NUMERIC + 1 AS b,
data['a']::NUMERIC + 2 AS c,
data['a']::NUMERIC + 3 AS d
FROM test
) updated_vals
WHERE test.id = updated_vals.id;
schemaname | sc2
tablename | test
rulename | r3
definition | CREATE RULE r3 AS +
| ON UPDATE TO sc2.test DO INSTEAD UPDATE sc2.test SET data['b'::text] = ((updated_vals.b)::text)::jsonb, data['c'::text] = ((updated_vals.c)::text)::jsonb, data['d'::text] = ((updated_vals.d)::text)::jsonb+
| FROM ( SELECT test_1.id, +
| test_1.data['a'::text] AS a, +
| ((test_1.data['a'::text])::numeric + (1)::numeric) AS b, +
| ((test_1.data['a'::text])::numeric + (2)::numeric) AS c, +
| ((test_1.data['a'::text])::numeric + (3)::numeric) AS d +
| FROM sc2.test test_1) updated_vals +
| WHERE (test.id = updated_vals.id);
The problem seems that the rule is designed to work as part of DO UPDATE SET:
appendStringInfoString(buf, " DO UPDATE SET ");
/* Deparse targetlist */
get_update_query_targetlist_def(query, confl->onConflictSet,
context, rte);
Whereas regular update :
get_update_query_targetlist_def(query, query->targetList, context, rte);
And, confl->onConflictSet contains all the elements as separate TargetEntry, whereas query->targetList contains only one entry with refassgnexpr as sub-nodes such as:
{TARGETENTRY
:expr
{SUBSCRIPTINGREF
:refcontainertype 3802
:refelemtype 0
:refrestype 3802
:reftypmod -1
:refcollid 0
:refupperindexpr (
{CONST
:consttype 25
:consttypmod -1
:constcollid 100
:constlen -1
:constbyval false
:constisnull false
:location 438
:constvalue 5 [ 20 0 0 0 100 ]
}
)
:reflowerindexpr <>
:refexpr
{SUBSCRIPTINGREF
:refcontainertype 3802
:refelemtype 0
:refrestype 3802
:reftypmod -1
:refcollid 0
:refupperindexpr (
{CONST
:consttype 25
:consttypmod -1
:constcollid 100
:constlen -1
:constbyval false
:constisnull false
:location 295
:constvalue 5 [ 20 0 0 0 99 ]
}
)
:reflowerindexpr <>
:refexpr
{SUBSCRIPTINGREF
:refcontainertype 3802
:refelemtype 0
:refrestype 3802
:reftypmod -1
:refcollid 0
:refupperindexpr (
{CONST
:consttype 25
:consttypmod -1
:constcollid 100
:constlen -1
:constbyval false
:constisnull false
:location 152
:constvalue 5 [ 20 0 0 0 98 ]
}
)
:reflowerindexpr <>
:refexpr
{VAR
:varno 1
:varattno 2
:vartype 3802
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnosyn 1
:varattnosyn 2
:location 147
}
:refassgnexpr
{COERCEVIAIO
:arg
{COERCEVIAIO
:arg
{VAR
:varno 2
:varattno 3
:vartype 1700
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnosyn 2
:varattnosyn 3
:location 159
}
:resulttype 25
:resultcollid 100
:coerceformat 1
:location 173
}
:resulttype 3802
:resultcollid 0
:coerceformat 1
:location 179
}
}
:refassgnexpr
{COERCEVIAIO
:arg
{COERCEVIAIO
:arg
{VAR
:varno 2
:varattno 4
:vartype 1700
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnosyn 2
:varattnosyn 4
:location 302
}
:resulttype 25
:resultcollid 100
:coerceformat 1
:location 316
}
:resulttype 3802
:resultcollid 0
:coerceformat 1
:location 322
}
}
:refassgnexpr
{COERCEVIAIO
:arg
{COERCEVIAIO
:arg
{VAR
:varno 2
:varattno 5
:vartype 1700
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnosyn 2
:varattnosyn 5
:location 445
}
:resulttype 25
:resultcollid 100
:coerceformat 1
:location 459
}
:resulttype 3802
:resultcollid 0
:coerceformat 1
:location 465
}
}
:resno 2
:resname data
:ressortgroupref 0
:resorigtbl 0
:resorigcol 0
:resjunk false
}
)
(self-note: There might be some functions that PG calls on onConflictSet but not on targetList for these json subscrpits)
Very similar to https://github.com/citusdata/citus/issues/1293 & https://github.com/citusdata/citus/pull/4234, need to split combined tle's to make deparser work.
Hi @ivyazmitinov,
I pushed some changes to a branch (https://github.com/citusdata/citus/tree/fix/subscript-jsonb) to fix this issue, and would appreciate if you have some time to verify that it works for your queries :) .
Hello, @onurctirtir Thanks for your hard work :) Unfortunately, I won't be able to check your fix on my environment, but I've checked your tests, and they seem to cover our usage of JSONB subscripting: we don't rely on it that much, tbh. I've found the original issue almost accidentally :)
Hello, @onurctirtir Thanks for your hard work :) Unfortunately, I won't be able to check your fix on my environment, but I've checked your tests, and they seem to cover our usage of JSONB subscripting: we don't rely on it that much, tbh. I've found the original issue almost accidentally :)
Thank you for the feedback @ivyazmitinov! :)
Fixed by #7675