Incorrect result from SELECT DISTINCT ON a distributed table (version 12.1-1 )
Query no longer returns the correct result after distributing table:
sandbox=# CREATE TABLE test ( attribute1 varchar(255), attribute2 varchar(255), attribute3 varchar(255) ); CREATE TABLE
sandbox=# INSERT INTO test (attribute1, attribute2, attribute3) VALUES ('Phone', 'John', 'A'), ('Phone', 'Eric', 'A'), ('Tablet','Eric', 'B'); INSERT 0 3
sandbox=# SELECT DISTINCT ON (T.attribute1, T.attribute2) T.attribute1 as attribute1, T.attribute3 as attribute2 FROM test T; attribute1 | attribute2 ------------+------------ Phone | A Phone | A Tablet | B (3 rows)
sandbox=# SELECT create_distributed_table('test', 'attribute1'); NOTICE: Copying data from local table... NOTICE: copying the data has completed DETAIL: The local data in the table is no longer visible, but is still on disk. HINT: To remove the local data, run: SELECT truncate_local_data_after_distributing_table($$"ford-dev".test$$) create_distributed_table
(1 row)
sandbox=# SELECT DISTINCT ON (T.attribute1, T.attribute2) T.attribute1 as attribute1, T.attribute3 as attribute2 FROM test T; attribute1 | attribute2 ------------+------------ Phone | A Tablet | B (2 rows)
Confirmed, the problem really is with this part T.attribute3 as attribute2 (apparently confuse citus).
@scooreman if you can patch/compile citus, I think this patch is ok:
diff --git a/src/backend/distributed/deparser/ruleutils_16.c b/src/backend/distributed/deparser/ruleutils_16.c
index cc0da165b..fc76a1257 100644
--- a/src/backend/distributed/deparser/ruleutils_16.c
+++ b/src/backend/distributed/deparser/ruleutils_16.c
@@ -2568,9 +2568,21 @@ get_basic_select_query(Query *query, deparse_context *context,
{
SortGroupClause *srt = (SortGroupClause *) lfirst(l);
- appendStringInfoString(buf, sep);
- get_rule_sortgroupclause(srt->tleSortGroupRef, query->targetList,
- false, context);
+ /* Ensure the correct target list is used without alias swapping */
+ TargetEntry *tle = get_sortgroupclause_tle(srt, query->targetList);
+
+ /* Ensure the attribute names are preserved */
+ if (tle && tle->resname != NULL)
+ {
+ appendStringInfoString(buf, sep);
+ appendStringInfoString(buf, quote_identifier(tle->resname));
+ }
+ else
+ {
+ /* Fallback to the default behavior */
+ get_rule_sortgroupclause(srt->tleSortGroupRef, query->targetList, false, context);
+ }
+
sep = ", ";
}
appendStringInfoChar(buf, ')');
I suppose you have a larger SQL code base to test, so if you can give it a try ?
@c2main I can confirm that the fix works on our data set. Amazing turnaround. Many thanks!
@scooreman nice to know it works!
@onurctirtir or others , I didn't made a PR for that yet, can you reopen ?
@scooreman be careful, the provided patch is probably not the good fix, it's more subtle than that.
Citus is building a query like the following, note the worker_column_3 added, and the absence of table qualified names.
SELECT DISTINCT ON (attribute1, attribute2)
attribute1
, attribute3 AS attribute2
, attribute2 AS worker_column_3
FROM test_pg t WHERE true;
------------+------------+-----------------
Phone | A | John
Tablet | B | Eric
And the results for this query are correct ! See PostgreSQL DISTINCT and ORDER BY documentation in [1] and [2].
The problem is only related to absence of table qualified name in the built query by citus. So my patch is good only by accident...
[1] https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT [2] https://www.postgresql.org/docs/current/sql-select.html#SQL-ORDERBY
For reference, there is a patch from August 2024 in PostgreSQL touching this area: https://github.com/postgres/postgres/commit/a7eb633563c
It's really not clear what's the best fix for Citus, and having to manage PostgreSQL 14 and 15 may not be that easy or free. Maybe better to backpatch more of PostgreSQL 17 ruleutils code into ruleutils 14, 15 and 16... @naisila given your work on PostgreSQL 17, is it something you explored already ?
@c2main
given your work on PostgreSQL 17, is it something you explored already?
Looking into this right now. It does seem that backpatching that PG17 fix to ruleutils_15 and ruleutils_16 is a good option.