cloudberry
cloudberry copied to clipboard
COPY ... ON SEGMENT on REPLICATED tables: FROM returns N× in Greenplum but only 1× in Cloudberry
-- Cloudberry cluster with 4 segments
-- Greenplum cluster with 4 segments
create table tab_replicated(a int, b int, c int) distributed replicated;
insert into tab_replicated select i, i, i from generate_series(1, 100) i;
copy tab_replicated to '/home/cbdb/tab_replicated_<SEGID>.txt' on segment;
-- Greenplum: COPY 400 | Cloudberry: COPY 400
create table tab_replicated_new(a int, b int, c int) distributed replicated;
copy tab_replicated_new from '/home/cbdb/tab_replicated_<SEGID>.txt' on segment;
-- Greenplum: COPY 400 | Cloudberry: COPY 100 (unexpected)
This bug also leads to illogical and unpredictable row counts in other scenarios(backup/restore). For example, on a 4-segment cluster performing a COPY ... FROM ... ON SEGMENT into a replicated table:
- If three segments are provided with identical data files, each containing 100 rows,
- And the fourth segment is provided with an empty file,
- The COPY command incorrectly reports that 75 rows were copied.
This result appears to be calculated as (100 rows * 3 files) / 4 segments = 75. This behavior is arbitrary and nonsensical for a data loading operation