gporca
gporca copied to clipboard
Query plan on table with index x2 faster without ORCA and Seq Scan
Hello Greenplum Team! Thank you for great database!
I'm trying to use indexes in Greenplum 6 (single mode with two segments) and I've got the case:
CREATE TABLE tbl (id text, CreatedAt timestamp);
INSERT INTO tbl 10 millions of rows;
CREATE INDEX tbl_idx ON tbl (date("CreatedAt"));
EXPLAIN ANALYZE SELECT date("CreatedAt"), count(*) FROM tbl WHERE date("CreatedAt") > '2019-01-01' GROUP BY 1;
-- Execute time: 10-12 seconds
-- Seq Scan
SET optimizer = off;
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT <as above>;
-- Execute time: 4-6 seconds
-- Bitmap index scan tbl_idx
So as you can see I've got x2 performance boost with disabling optimizer to using index.
I know this is sometimes happened case for PostgreSQL but I just want that you know that and may be you can share your thoughts about it? May be you can suggest some tuning techniques?
Thanks!
I'm facing the same issue. Orca use seq scan instand of index scan. And I have to disable both seqscan and optimizer.
I guess that optimizer_enable_tablescan
may be used to disable seq scan of orca but it is not. So how can we disable seq scan and force orca to use index scan ?
Apparently orca (and planer) calculate wrong cost of index scan.
Orca doesn't support index on expressions or functions. https://github.com/greenplum-db/gporca/issues/311
So this is a known, unsupported feature.
Orca doesn't support index on expressions or functions. https://github.com/greenplum-db/gporca/issues/311
Are there any plans to give support? @vraghavan78
select * from gp_stat_user_indexes
where schemaname='dw' and relname = 'fact_cpu_sno_parts'
order by schemaname,relname
/*
"1359725" "1359730" "dw" "fact_cpu_sno_parts" "idx_dw_fact_cpu_sno_parts_scansn" 8028 420 321
"1359725" "22888997" "dw" "fact_cpu_sno_parts" "idx_dw_fact_cpu_sno_parts_sno" 13621 258138 28369
*/
-- drop index dw.idx_dw_fact_cpu_sno_parts_scansn_upper;
CREATE INDEX idx_dw_fact_cpu_sno_parts_scansn_upper
ON dw.fact_cpu_sno_parts USING btree
(upper(scansn))
TABLESPACE pg_default;
analyse dw.fact_cpu_sno_parts;
/*
"Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..13054.80 rows=36864791 width=11)"
" -> Hash Left Join (cost=0.00..11853.13 rows=6144132 width=11)"
" Hash Cond: (((fact_cpu_sno_parts.iecpn)::text = (dim_cpu_parts_cust.pn)::text) AND ((fact_cpu_sn.customer)::text = (dim_cpu_parts_cust.customer)::text))"
" -> Hash Join (cost=0.00..8840.69 rows=5723178 width=25)"
" Hash Cond: ((fact_cpu_sno_parts.sno)::text = (fact_cpu_sn.sno)::text)"
" -> Seq Scan on fact_cpu_sno_parts (cost=0.00..6351.30 rows=5723178 width=22)"
" Filter: (((status)::text = '1'::text) AND ((wc)::text = ANY ('{M40,M61,M46,40,61,46,83}'::text[])) AND (upper((scansn)::text) = '80AD02210723370D26'::text))"
" -> Hash (cost=555.26..555.26 rows=815597 width=14)"
" -> Seq Scan on fact_cpu_sn (cost=0.00..555.26 rows=815597 width=14)"
" -> Hash (cost=493.65..493.65 rows=380615 width=15)"
" -> Broadcast Motion 6:6 (slice1; segments: 6) (cost=0.00..493.65 rows=380615 width=15)"
" -> Seq Scan on dim_cpu_parts_cust (cost=0.00..436.93 rows=63436 width=15)"
"Optimizer: Pivotal Optimizer (GPORCA)"
*/
explain --analyse
select * from dw.fact_cpu_sno_parts sp where upper(sp.scansn) in ('80AD02210723370D26');
/*
"Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..36784.56 rows=67016628 width=149)"
" -> Seq Scan on fact_cpu_sno_parts (cost=0.00..7194.26 rows=11169438 width=149)"
" Filter: (upper((scansn)::text) = '80AD02210723370D26'::text)"
"Optimizer: Pivotal Optimizer (GPORCA)"
*/
explain --analyse
select * from dw.fact_cpu_sno_parts sp where sp.scansn in ('80AD02210723370D26');
/*
"Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..7.02 rows=1640 width=149)"
" -> Index Scan using idx_dw_fact_cpu_sno_parts_scansn on fact_cpu_sno_parts (cost=0.00..6.22 rows=274 width=149)"
" Index Cond: ((scansn)::text = '80AD02210723370D26'::text)"
"Optimizer: Pivotal Optimizer (GPORCA)"
*/
set optimizer = off;
explain --analyse
select * from dw.fact_cpu_sno_parts sp where upper(sp.scansn) in ('80AD02210723370D26');
/*
"Gather Motion 6:1 (slice1; segments: 6) (cost=212.69..155810.95 rows=1612 width=266)"
" -> Bitmap Heap Scan on fact_cpu_sno_parts sp (cost=212.69..155810.95 rows=269 width=266)"
" Recheck Cond: (upper((scansn)::text) = '80AD02210723370D26'::text)"
" -> Bitmap Index Scan on idx_dw_fact_cpu_sno_parts_scansn_upper (cost=0.00..212.29 rows=269 width=0)"
" Index Cond: (upper((scansn)::text) = '80AD02210723370D26'::text)"
"Optimizer: Postgres query optimizer"
*/
explain --analyse
select * from dw.fact_cpu_sno_parts sp where sp.scansn in ('80AD02210723370D26');
/*
"Gather Motion 6:1 (slice1; segments: 6) (cost=312.90..158562.68 rows=1640 width=267)"
" -> Bitmap Heap Scan on fact_cpu_sno_parts sp (cost=312.90..158562.68 rows=274 width=267)"
" Recheck Cond: ((scansn)::text = '80AD02210723370D26'::text)"
" -> Bitmap Index Scan on idx_dw_fact_cpu_sno_parts_scansn (cost=0.00..312.49 rows=274 width=0)"
" Index Cond: ((scansn)::text = '80AD02210723370D26'::text)"
"Optimizer: Postgres query optimizer"
*/
reset optimizer;