cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

[Bug] ORCA fallbacks for collate "C"

Open my-ship-it opened this issue 1 year ago • 3 comments

Cloudberry Database version

No response

What happened

Currently, when column attribute of table is collate "C", ORCA would fallback. We need to support it in ORCA also, because sometimes ORCA would produce better plan.

What you think should happen instead

No response

How to reproduce

postgres=# create table tbl(v text);
CREATE TABLE
postgres=# create table tbl_collate_c(v text collate "C");
CREATE TABLE
postgres=# explain select * from tbl order by v;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
   Merge Key: v
   ->  Sort  (cost=0.00..431.00 rows=1 width=8)
         Sort Key: v
         ->  Seq Scan on tbl  (cost=0.00..431.00 rows=1 width=8)
 Optimizer: Pivotal Optimizer (GPORCA)
(6 rows)

postgres=# explain select * from tbl_collate_c order by v;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=1451.09..2199.09 rows=52800 width=32)
   Merge Key: v
   ->  Sort  (cost=1451.09..1495.09 rows=17600 width=32)
         Sort Key: v COLLATE "C"
         ->  Seq Scan on tbl_collate_c  (cost=0.00..210.00 rows=17600 width=32)
 Optimizer: Postgres query optimizer
(6 rows)

Operating System

No specific

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes, I am willing to submit a PR!

Code of Conduct

my-ship-it avatar Nov 19 '24 07:11 my-ship-it

// GPDB_91_MERGE_FIXME: collation
	INT non_default_collation = gpdb::CheckCollation((Node *) query);

	if (0 < non_default_collation)
	{
		GPOS_RAISE(gpdxl::ExmaDXL, gpdxl::ExmiQuery2DXLUnsupportedFeature,
				   GPOS_WSZ_LIT("Non-default collation"));
	}

Need to dig how to solve this.

yjhjstz avatar Dec 10 '24 06:12 yjhjstz

Each of phy expression(sort) need derived the collation i guess...

I'm researching this this part of logic..

jiaqizho avatar Dec 16 '24 05:12 jiaqizho

In my research, I found that it is very difficult to support collate "c" in ORCA for several reasons:

  1. After the sql passes through the parser, there may always be T_RelabelType or T_CollateExpr(common in subqueries)
  2. ORCA can't deal the T_RelabelType or T_CollateExpr in current version, This also means that we can’t just deal the collate in the DXLToPlStmt stage of ORCA.

Therefore, when ORCA receives a query, we need to include collation in the operator and calculate the collation during the exploration and implementation phases. This will be a big change in ORCA.

jiaqizho avatar Dec 24 '24 03:12 jiaqizho