cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

COPY ... ON SEGMENT on REPLICATED tables: FROM returns N× in Greenplum but only 1× in Cloudberry

Open robertmu opened this issue 4 months ago • 1 comments


 -- Cloudberry cluster with 4 segments
 -- Greenplum cluster with 4 segments

  create table tab_replicated(a int, b int, c int) distributed replicated;
  insert into tab_replicated select i, i, i from generate_series(1, 100) i;

  copy tab_replicated to '/home/cbdb/tab_replicated_<SEGID>.txt' on segment;
  -- Greenplum: COPY 400   |  Cloudberry: COPY 400

  create table tab_replicated_new(a int, b int, c int) distributed replicated;
  copy tab_replicated_new from '/home/cbdb/tab_replicated_<SEGID>.txt' on segment;
  -- Greenplum: COPY 400   |  Cloudberry: COPY 100  (unexpected)

robertmu avatar Aug 08 '25 03:08 robertmu

This bug also leads to illogical and unpredictable row counts in other scenarios(backup/restore). For example, on a 4-segment cluster performing a COPY ... FROM ... ON SEGMENT into a replicated table:

  • If three segments are provided with identical data files, each containing 100 rows,
  • And the fourth segment is provided with an empty file,
  • The COPY command incorrectly reports that 75 rows were copied.

This result appears to be calculated as (100 rows * 3 files) / 4 segments = 75. This behavior is arbitrary and nonsensical for a data loading operation

robertmu avatar Aug 08 '25 04:08 robertmu