gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

gpload tempopary table generated without considering order of distribution keys

Open TimonSP opened this issue 1 year ago • 8 comments

Bug Report

When gpload.py utility degenerates temporary table for loading data it uses array of distibuted keys generated by get_table_dist_key method, which provide list of distibution keys without considering its order. That's why temporary table and target table could have different structure and update won't work optimazted (for example would scan all table partitions even only few of them are actuall affected).

Expected behavior

there is order by clause in SQL query of get_table_dist_key method:

order by position(concat(' ',a.attnum::text,' ') in concat(' ',p.distkey::text,' '));

Actual behavior

there are no order by clause

Step to reproduce the behavior

you can use tables like below to verify current and new query:

create table public.test_table_check_dist_case1
(
col1 int4,
col2 int4,
col3 int4, 
col4 text
)
distributed by (col2, col1, col3)

create table public.test_table_check_dist_case2 
(
col1 int4 default '0',
col2 int4,
col3 int4, 
col4 text
)
distributed by (col1, col2, col3)

TimonSP avatar Jan 11 '24 07:01 TimonSP