citus icon indicating copy to clipboard operation
citus copied to clipboard

UPDATE statement with array/jsonb subscripting behaves incorrectly with more than one field

Open ivyazmitinov opened this issue 3 years ago • 7 comments

Steps to Reproduce

Consider the following script:

DROP TABLE IF EXISTS test;
CREATE TABLE test
(
    id   INTEGER,
    data jsonb
);
SELECT create_distributed_table('test', 'id');

INSERT INTO test
VALUES (1, '{
  "a": 1
}');
INSERT INTO test
VALUES (2, '{
  "a": 2
}');
INSERT INTO test
VALUES (3, '{
  "a": 3
}');
INSERT INTO test
VALUES (4, '{
  "a": 4
}');
INSERT INTO test
VALUES (5, '{
  "a": 5
}');

UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
  , data['c'] = updated_vals.c::TEXT::jsonb
  , data['d'] = updated_vals.d::TEXT::jsonb
FROM (
         SELECT id,
                data['a']              AS a,
                data['a']::NUMERIC + 1 AS b,
                data['a']::NUMERIC + 2 AS c,
                data['a']::NUMERIC + 3 AS d
         FROM test
     ) updated_vals
WHERE test.id = updated_vals.id;

SELECT *
FROM test;

Expected result

+--+--------------------------------+
|id|data                            |
+--+--------------------------------+
|1 |{"a": 1, "b": 2, "c": 3, "d": 4}|
|2 |{"a": 2, "b": 3, "c": 4, "d": 5}|
|3 |{"a": 3, "b": 4, "c": 5, "d": 6}|
|4 |{"a": 4, "b": 5, "c": 6, "d": 7}|
|5 |{"a": 5, "b": 6, "c": 7, "d": 8}|
+--+--------------------------------+

Actual result

+--+----------------+
|id|data            |
+--+----------------+
|1 |{"a": 1, "d": 4}|
|5 |{"a": 5, "d": 8}|
|4 |{"a": 4, "d": 7}|
|3 |{"a": 3, "d": 6}|
|2 |{"a": 2, "d": 5}|
+--+----------------+

Environment

Postgresql: 14.1 Citus: 10.2.3

ivyazmitinov avatar Jan 13 '22 13:01 ivyazmitinov

That's bad, we have tests, but only updates a single field hence we couldn't spot: https://github.com/citusdata/citus/pull/5232/files

Interestingly only the last SET is sent to workers:


UPDATE public.test_102868 test
SET DATA['d'::text] = ((updated_vals.d)::text)::JSONB
FROM
  (SELECT test_1.id,
          test_1.data['a'::text] AS a,
          ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (1)::numeric) AS b,
          ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (2)::numeric) AS c,
          ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (3)::numeric) AS d
   FROM public.test_102868 test_1) updated_vals
WHERE (test.id OPERATOR(pg_catalog.=) updated_vals.id)

Even if I force intermediate results, we get the same behavior:


UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
  , data['c'] = updated_vals.c::TEXT::jsonb
  , data['d'] = updated_vals.d::TEXT::jsonb
FROM (
         SELECT id,
                data['a']              AS a,
                data['a']::NUMERIC + 1 AS b,
                data['a']::NUMERIC + 2 AS c,
                data['a']::NUMERIC + 3 AS d
         FROM test OFFSET 0
     ) updated_vals
WHERE test.id = updated_vals.id;

Probably a deparser issue because bare deparser produce wrong result:


SELECT deparse_shard_query_test($$
UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
  , data['c'] = updated_vals.c::TEXT::jsonb
  , data['d'] = updated_vals.d::TEXT::jsonb
FROM (
         SELECT id,
                data['a']              AS a,
                data['a']::NUMERIC + 1 AS b,
                data['a']::NUMERIC + 2 AS c,
                data['a']::NUMERIC + 3 AS d
         FROM test
     ) updated_vals
WHERE test.id = updated_vals.id;
$$);

INFO:  query: UPDATE public.test SET data['d'::text] = ((updated_vals.d)::text)::jsonb FROM (SELECT test_1.id, test_1.data['a'::text] AS a, ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (1)::numeric) AS b, ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (2)::numeric) AS c, ((test_1.data['a'::text])::numeric OPERATOR(pg_catalog.+) (3)::numeric) AS d FROM public.test test_1) updated_vals WHERE (test.id OPERATOR(pg_catalog.=) updated_vals.id)

onderkalaci avatar Jan 13 '22 18:01 onderkalaci

So, the RULE (e.g., what all deparser exists in PG) works fine:

CREATE TABLE test
(
    id   INTEGER,
    data jsonb
);
CREATE TABLE test_2
(
    id   INTEGER,
    data jsonb
);


CREATE RULE r3 AS ON UPDATE TO test
	DO INSTEAD
UPDATE test
SET data['b'] = updated_vals.b::TEXT::jsonb
  , data['c'] = updated_vals.c::TEXT::jsonb
  , data['d'] = updated_vals.d::TEXT::jsonb
FROM (
         SELECT id,
                data['a']              AS a,
                data['a']::NUMERIC + 1 AS b,
                data['a']::NUMERIC + 2 AS c,
                data['a']::NUMERIC + 3 AS d
         FROM test
     ) updated_vals
WHERE test.id = updated_vals.id;

schemaname | sc2
tablename  | test
rulename   | r3
definition | CREATE RULE r3 AS                                                                                                                                                                                                +
           |     ON UPDATE TO sc2.test DO INSTEAD  UPDATE sc2.test SET data['b'::text] = ((updated_vals.b)::text)::jsonb, data['c'::text] = ((updated_vals.c)::text)::jsonb, data['d'::text] = ((updated_vals.d)::text)::jsonb+
           |    FROM ( SELECT test_1.id,                                                                                                                                                                                      +
           |             test_1.data['a'::text] AS a,                                                                                                                                                                         +
           |             ((test_1.data['a'::text])::numeric + (1)::numeric) AS b,                                                                                                                                             +
           |             ((test_1.data['a'::text])::numeric + (2)::numeric) AS c,                                                                                                                                             +
           |             ((test_1.data['a'::text])::numeric + (3)::numeric) AS d                                                                                                                                              +
           |            FROM sc2.test test_1) updated_vals                                                                                                                                                                    +
           |   WHERE (test.id = updated_vals.id);

The problem seems that the rule is designed to work as part of DO UPDATE SET:

			appendStringInfoString(buf, " DO UPDATE SET ");
			/* Deparse targetlist */
			get_update_query_targetlist_def(query, confl->onConflictSet,
											context, rte);

Whereas regular update :

	get_update_query_targetlist_def(query, query->targetList, context, rte);

And, confl->onConflictSet contains all the elements as separate TargetEntry, whereas query->targetList contains only one entry with refassgnexpr as sub-nodes such as:

   {TARGETENTRY 
   :expr 
      {SUBSCRIPTINGREF 
      :refcontainertype 3802 
      :refelemtype 0 
      :refrestype 3802 
      :reftypmod -1 
      :refcollid 0 
      :refupperindexpr (
         {CONST 
         :consttype 25 
         :consttypmod -1 
         :constcollid 100 
         :constlen -1 
         :constbyval false 
         :constisnull false 
         :location 438 
         :constvalue 5 [ 20 0 0 0 100 ]
         }
      )
      :reflowerindexpr <> 
      :refexpr 
         {SUBSCRIPTINGREF 
         :refcontainertype 3802 
         :refelemtype 0 
         :refrestype 3802 
         :reftypmod -1 
         :refcollid 0 
         :refupperindexpr (
            {CONST 
            :consttype 25 
            :consttypmod -1 
            :constcollid 100 
            :constlen -1 
            :constbyval false 
            :constisnull false 
            :location 295 
            :constvalue 5 [ 20 0 0 0 99 ]
            }
         )
         :reflowerindexpr <> 
         :refexpr 
            {SUBSCRIPTINGREF 
            :refcontainertype 3802 
            :refelemtype 0 
            :refrestype 3802 
            :reftypmod -1 
            :refcollid 0 
            :refupperindexpr (
               {CONST 
               :consttype 25 
               :consttypmod -1 
               :constcollid 100 
               :constlen -1 
               :constbyval false 
               :constisnull false 
               :location 152 
               :constvalue 5 [ 20 0 0 0 98 ]
               }
            )
            :reflowerindexpr <> 
            :refexpr 
               {VAR 
               :varno 1 
               :varattno 2 
               :vartype 3802 
               :vartypmod -1 
               :varcollid 0 
               :varlevelsup 0 
               :varnosyn 1 
               :varattnosyn 2 
               :location 147
               }
            :refassgnexpr 
               {COERCEVIAIO 
               :arg 
                  {COERCEVIAIO 
                  :arg 
                     {VAR 
                     :varno 2 
                     :varattno 3 
                     :vartype 1700 
                     :vartypmod -1 
                     :varcollid 0 
                     :varlevelsup 0 
                     :varnosyn 2 
                     :varattnosyn 3 
                     :location 159
                     }
                  :resulttype 25 
                  :resultcollid 100 
                  :coerceformat 1 
                  :location 173
                  }
               :resulttype 3802 
               :resultcollid 0 
               :coerceformat 1 
               :location 179
               }
            }
         :refassgnexpr 
            {COERCEVIAIO 
            :arg 
               {COERCEVIAIO 
               :arg 
                  {VAR 
                  :varno 2 
                  :varattno 4 
                  :vartype 1700 
                  :vartypmod -1 
                  :varcollid 0 
                  :varlevelsup 0 
                  :varnosyn 2 
                  :varattnosyn 4 
                  :location 302
                  }
               :resulttype 25 
               :resultcollid 100 
               :coerceformat 1 
               :location 316
               }
            :resulttype 3802 
            :resultcollid 0 
            :coerceformat 1 
            :location 322
            }
         }
      :refassgnexpr 
         {COERCEVIAIO 
         :arg 
            {COERCEVIAIO 
            :arg 
               {VAR 
               :varno 2 
               :varattno 5 
               :vartype 1700 
               :vartypmod -1 
               :varcollid 0 
               :varlevelsup 0 
               :varnosyn 2 
               :varattnosyn 5 
               :location 445
               }
            :resulttype 25 
            :resultcollid 100 
            :coerceformat 1 
            :location 459
            }
         :resulttype 3802 
         :resultcollid 0 
         :coerceformat 1 
         :location 465
         }
      }
   :resno 2 
   :resname data 
   :ressortgroupref 0 
   :resorigtbl 0 
   :resorigcol 0 
   :resjunk false
   }
)

onderkalaci avatar Jan 14 '22 08:01 onderkalaci

(self-note: There might be some functions that PG calls on onConflictSet but not on targetList for these json subscrpits)

onderkalaci avatar Jan 14 '22 11:01 onderkalaci

Very similar to https://github.com/citusdata/citus/issues/1293 & https://github.com/citusdata/citus/pull/4234, need to split combined tle's to make deparser work.

onurctirtir avatar Feb 07 '22 13:02 onurctirtir

Hi @ivyazmitinov,

I pushed some changes to a branch (https://github.com/citusdata/citus/tree/fix/subscript-jsonb) to fix this issue, and would appreciate if you have some time to verify that it works for your queries :) .

onurctirtir avatar Feb 09 '22 11:02 onurctirtir

Hello, @onurctirtir Thanks for your hard work :) Unfortunately, I won't be able to check your fix on my environment, but I've checked your tests, and they seem to cover our usage of JSONB subscripting: we don't rely on it that much, tbh. I've found the original issue almost accidentally :)

ivyazmitinov avatar Feb 09 '22 12:02 ivyazmitinov

Hello, @onurctirtir Thanks for your hard work :) Unfortunately, I won't be able to check your fix on my environment, but I've checked your tests, and they seem to cover our usage of JSONB subscripting: we don't rely on it that much, tbh. I've found the original issue almost accidentally :)

Thank you for the feedback @ivyazmitinov! :)

onurctirtir avatar Feb 09 '22 12:02 onurctirtir

Fixed by #7675

colm-mchugh avatar Jul 22 '25 16:07 colm-mchugh