[Bug] ORCA fallbacks for collate "C"
Cloudberry Database version
No response
What happened
Currently, when column attribute of table is collate "C", ORCA would fallback. We need to support it in ORCA also, because sometimes ORCA would produce better plan.
What you think should happen instead
No response
How to reproduce
postgres=# create table tbl(v text);
CREATE TABLE
postgres=# create table tbl_collate_c(v text collate "C");
CREATE TABLE
postgres=# explain select * from tbl order by v;
QUERY PLAN
------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=8)
Merge Key: v
-> Sort (cost=0.00..431.00 rows=1 width=8)
Sort Key: v
-> Seq Scan on tbl (cost=0.00..431.00 rows=1 width=8)
Optimizer: Pivotal Optimizer (GPORCA)
(6 rows)
postgres=# explain select * from tbl_collate_c order by v;
QUERY PLAN
---------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=1451.09..2199.09 rows=52800 width=32)
Merge Key: v
-> Sort (cost=1451.09..1495.09 rows=17600 width=32)
Sort Key: v COLLATE "C"
-> Seq Scan on tbl_collate_c (cost=0.00..210.00 rows=17600 width=32)
Optimizer: Postgres query optimizer
(6 rows)
Operating System
No specific
Anything else
No response
Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct.
// GPDB_91_MERGE_FIXME: collation
INT non_default_collation = gpdb::CheckCollation((Node *) query);
if (0 < non_default_collation)
{
GPOS_RAISE(gpdxl::ExmaDXL, gpdxl::ExmiQuery2DXLUnsupportedFeature,
GPOS_WSZ_LIT("Non-default collation"));
}
Need to dig how to solve this.
Each of phy expression(sort) need derived the collation i guess...
I'm researching this this part of logic..
In my research, I found that it is very difficult to support collate "c" in ORCA for several reasons:
- After the sql passes through the parser, there may always be
T_RelabelTypeorT_CollateExpr(common in subqueries) - ORCA can't deal the
T_RelabelTypeorT_CollateExprin current version, This also means that we can’t just deal the collate in the DXLToPlStmt stage of ORCA.
Therefore, when ORCA receives a query, we need to include collation in the operator and calculate the collation during the exploration and implementation phases. This will be a big change in ORCA.