citus icon indicating copy to clipboard operation
citus copied to clipboard

Incorrect result from SELECT DISTINCT ON a distributed table (version 12.1-1 )

Open serge-cooreman-pfx opened this issue 1 year ago • 6 comments

Query no longer returns the correct result after distributing table:

sandbox=# CREATE TABLE test ( attribute1 varchar(255), attribute2 varchar(255), attribute3 varchar(255) ); CREATE TABLE

sandbox=# INSERT INTO test (attribute1, attribute2, attribute3) VALUES ('Phone', 'John', 'A'), ('Phone', 'Eric', 'A'), ('Tablet','Eric', 'B'); INSERT 0 3

sandbox=# SELECT DISTINCT ON (T.attribute1, T.attribute2) T.attribute1 as attribute1, T.attribute3 as attribute2 FROM test T; attribute1 | attribute2 ------------+------------ Phone | A Phone | A Tablet | B (3 rows)

sandbox=# SELECT create_distributed_table('test', 'attribute1'); NOTICE: Copying data from local table... NOTICE: copying the data has completed DETAIL: The local data in the table is no longer visible, but is still on disk. HINT: To remove the local data, run: SELECT truncate_local_data_after_distributing_table($$"ford-dev".test$$) create_distributed_table

(1 row)

sandbox=# SELECT DISTINCT ON (T.attribute1, T.attribute2) T.attribute1 as attribute1, T.attribute3 as attribute2 FROM test T; attribute1 | attribute2 ------------+------------ Phone | A Tablet | B (2 rows)

serge-cooreman-pfx avatar Sep 06 '24 19:09 serge-cooreman-pfx

Confirmed, the problem really is with this part T.attribute3 as attribute2 (apparently confuse citus).

c2main avatar Sep 12 '24 07:09 c2main

@scooreman if you can patch/compile citus, I think this patch is ok:

diff --git a/src/backend/distributed/deparser/ruleutils_16.c b/src/backend/distributed/deparser/ruleutils_16.c
index cc0da165b..fc76a1257 100644
--- a/src/backend/distributed/deparser/ruleutils_16.c
+++ b/src/backend/distributed/deparser/ruleutils_16.c
@@ -2568,9 +2568,21 @@ get_basic_select_query(Query *query, deparse_context *context,
                        {
                                SortGroupClause *srt = (SortGroupClause *) lfirst(l);
 
-                               appendStringInfoString(buf, sep);
-                               get_rule_sortgroupclause(srt->tleSortGroupRef, query->targetList,
-                                                                                false, context);
+                               /* Ensure the correct target list is used without alias swapping */
+                               TargetEntry *tle = get_sortgroupclause_tle(srt, query->targetList);
+
+                               /* Ensure the attribute names are preserved */
+                               if (tle && tle->resname != NULL)
+                               {
+                                       appendStringInfoString(buf, sep);
+                                       appendStringInfoString(buf, quote_identifier(tle->resname));
+                               }
+                               else
+                               {
+                                       /* Fallback to the default behavior */
+                                       get_rule_sortgroupclause(srt->tleSortGroupRef, query->targetList, false, context);
+                               }
+
                                sep = ", ";
                        }
                        appendStringInfoChar(buf, ')');

I suppose you have a larger SQL code base to test, so if you can give it a try ?

c2main avatar Sep 12 '24 17:09 c2main

@c2main I can confirm that the fix works on our data set. Amazing turnaround. Many thanks!

serge-cooreman-pfx avatar Sep 15 '24 19:09 serge-cooreman-pfx

@scooreman nice to know it works!

@onurctirtir or others , I didn't made a PR for that yet, can you reopen ?

c2main avatar Sep 24 '24 08:09 c2main

@scooreman be careful, the provided patch is probably not the good fix, it's more subtle than that.

Citus is building a query like the following, note the worker_column_3 added, and the absence of table qualified names.

SELECT DISTINCT ON (attribute1, attribute2) 
  attribute1
, attribute3 AS attribute2
, attribute2 AS worker_column_3 

FROM test_pg t WHERE true;
------------+------------+-----------------
 Phone      | A          | John
 Tablet     | B          | Eric

And the results for this query are correct ! See PostgreSQL DISTINCT and ORDER BY documentation in [1] and [2].

The problem is only related to absence of table qualified name in the built query by citus. So my patch is good only by accident...

[1] https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT [2] https://www.postgresql.org/docs/current/sql-select.html#SQL-ORDERBY

c2main avatar Sep 25 '24 09:09 c2main

For reference, there is a patch from August 2024 in PostgreSQL touching this area: https://github.com/postgres/postgres/commit/a7eb633563c

It's really not clear what's the best fix for Citus, and having to manage PostgreSQL 14 and 15 may not be that easy or free. Maybe better to backpatch more of PostgreSQL 17 ruleutils code into ruleutils 14, 15 and 16... @naisila given your work on PostgreSQL 17, is it something you explored already ?

c2main avatar Sep 25 '24 11:09 c2main

@c2main

given your work on PostgreSQL 17, is it something you explored already?

Looking into this right now. It does seem that backpatching that PG17 fix to ruleutils_15 and ruleutils_16 is a good option.

naisila avatar Mar 18 '25 09:03 naisila