gporca Query plan on table with index x2 faster without ORCA and Seq Scan

Hello Greenplum Team! Thank you for great database!

I'm trying to use indexes in Greenplum 6 (single mode with two segments) and I've got the case:

CREATE TABLE tbl (id text, CreatedAt timestamp);

INSERT INTO tbl 10 millions of rows;

CREATE INDEX tbl_idx ON tbl (date("CreatedAt"));

EXPLAIN ANALYZE SELECT date("CreatedAt"), count(*) FROM tbl WHERE date("CreatedAt") > '2019-01-01' GROUP BY 1;
-- Execute time: 10-12 seconds 
-- Seq Scan

SET optimizer = off;
SET enable_seqscan = off;

EXPLAIN ANALYZE SELECT <as above>;
-- Execute time: 4-6 seconds 
-- Bitmap index scan tbl_idx

So as you can see I've got x2 performance boost with disabling optimizer to using index.

I know this is sometimes happened case for PostgreSQL but I just want that you know that and may be you can share your thoughts about it? May be you can suggest some tuning techniques?

Thanks!

Sep 17 '19 12:09 anki-code

I'm facing the same issue. Orca use seq scan instand of index scan. And I have to disable both seqscan and optimizer.

I guess that optimizer_enable_tablescan may be used to disable seq scan of orca but it is not. So how can we disable seq scan and force orca to use index scan ?

Apparently orca (and planer) calculate wrong cost of index scan.

Sep 24 '19 12:09 wangzw

Orca doesn't support index on expressions or functions. https://github.com/greenplum-db/gporca/issues/311

Dec 07 '19 02:12 hsyuan

So this is a known, unsupported feature.

Dec 07 '19 02:12 hsyuan

Orca doesn't support index on expressions or functions. https://github.com/greenplum-db/gporca/issues/311

Are there any plans to give support? @vraghavan78

select * from gp_stat_user_indexes
where schemaname='dw' and relname = 'fact_cpu_sno_parts'
order by schemaname,relname

/*
"1359725"	"1359730"	"dw"	"fact_cpu_sno_parts"	"idx_dw_fact_cpu_sno_parts_scansn"	8028	420	321
"1359725"	"22888997"	"dw"	"fact_cpu_sno_parts"	"idx_dw_fact_cpu_sno_parts_sno"	13621	258138	28369
*/

-- drop index dw.idx_dw_fact_cpu_sno_parts_scansn_upper;

CREATE INDEX idx_dw_fact_cpu_sno_parts_scansn_upper
    ON dw.fact_cpu_sno_parts USING btree
    (upper(scansn))
    TABLESPACE pg_default;

analyse dw.fact_cpu_sno_parts;
/*
"Gather Motion 6:1  (slice2; segments: 6)  (cost=0.00..13054.80 rows=36864791 width=11)"
"  ->  Hash Left Join  (cost=0.00..11853.13 rows=6144132 width=11)"
"        Hash Cond: (((fact_cpu_sno_parts.iecpn)::text = (dim_cpu_parts_cust.pn)::text) AND ((fact_cpu_sn.customer)::text = (dim_cpu_parts_cust.customer)::text))"
"        ->  Hash Join  (cost=0.00..8840.69 rows=5723178 width=25)"
"              Hash Cond: ((fact_cpu_sno_parts.sno)::text = (fact_cpu_sn.sno)::text)"
"              ->  Seq Scan on fact_cpu_sno_parts  (cost=0.00..6351.30 rows=5723178 width=22)"
"                    Filter: (((status)::text = '1'::text) AND ((wc)::text = ANY ('{M40,M61,M46,40,61,46,83}'::text[])) AND (upper((scansn)::text) = '80AD02210723370D26'::text))"
"              ->  Hash  (cost=555.26..555.26 rows=815597 width=14)"
"                    ->  Seq Scan on fact_cpu_sn  (cost=0.00..555.26 rows=815597 width=14)"
"        ->  Hash  (cost=493.65..493.65 rows=380615 width=15)"
"              ->  Broadcast Motion 6:6  (slice1; segments: 6)  (cost=0.00..493.65 rows=380615 width=15)"
"                    ->  Seq Scan on dim_cpu_parts_cust  (cost=0.00..436.93 rows=63436 width=15)"
"Optimizer: Pivotal Optimizer (GPORCA)"
*/

explain --analyse
select * from dw.fact_cpu_sno_parts sp where upper(sp.scansn) in ('80AD02210723370D26');
/*
"Gather Motion 6:1  (slice1; segments: 6)  (cost=0.00..36784.56 rows=67016628 width=149)"
"  ->  Seq Scan on fact_cpu_sno_parts  (cost=0.00..7194.26 rows=11169438 width=149)"
"        Filter: (upper((scansn)::text) = '80AD02210723370D26'::text)"
"Optimizer: Pivotal Optimizer (GPORCA)"
*/
explain --analyse
select * from dw.fact_cpu_sno_parts sp where sp.scansn in ('80AD02210723370D26');
/*
"Gather Motion 6:1  (slice1; segments: 6)  (cost=0.00..7.02 rows=1640 width=149)"
"  ->  Index Scan using idx_dw_fact_cpu_sno_parts_scansn on fact_cpu_sno_parts  (cost=0.00..6.22 rows=274 width=149)"
"        Index Cond: ((scansn)::text = '80AD02210723370D26'::text)"
"Optimizer: Pivotal Optimizer (GPORCA)"
*/

set optimizer = off;

explain --analyse
select * from dw.fact_cpu_sno_parts sp where upper(sp.scansn) in ('80AD02210723370D26');
/*
"Gather Motion 6:1  (slice1; segments: 6)  (cost=212.69..155810.95 rows=1612 width=266)"
"  ->  Bitmap Heap Scan on fact_cpu_sno_parts sp  (cost=212.69..155810.95 rows=269 width=266)"
"        Recheck Cond: (upper((scansn)::text) = '80AD02210723370D26'::text)"
"        ->  Bitmap Index Scan on idx_dw_fact_cpu_sno_parts_scansn_upper  (cost=0.00..212.29 rows=269 width=0)"
"              Index Cond: (upper((scansn)::text) = '80AD02210723370D26'::text)"
"Optimizer: Postgres query optimizer"
*/
explain --analyse
select * from dw.fact_cpu_sno_parts sp where sp.scansn in ('80AD02210723370D26');
/*
"Gather Motion 6:1  (slice1; segments: 6)  (cost=312.90..158562.68 rows=1640 width=267)"
"  ->  Bitmap Heap Scan on fact_cpu_sno_parts sp  (cost=312.90..158562.68 rows=274 width=267)"
"        Recheck Cond: ((scansn)::text = '80AD02210723370D26'::text)"
"        ->  Bitmap Index Scan on idx_dw_fact_cpu_sno_parts_scansn  (cost=0.00..312.49 rows=274 width=0)"
"              Index Cond: ((scansn)::text = '80AD02210723370D26'::text)"
"Optimizer: Postgres query optimizer"
*/

reset optimizer;

Jan 10 '24 04:01 cobolbaby

gporca gporca copied to clipboard

Query plan on table with index x2 faster without ORCA and Seq Scan

gporca
gporca copied to clipboard