cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

Inconsistent `checksum` default for Append-Optimized tables in `pg_class.reloptions`

Open robertmu opened this issue 5 months ago • 0 comments

Issue Description

There appears to be an inconsistency in how Cloudberry handles the default checksum storage option for append-optimized (AO) tables.

The system configuration gp_default_storage_options correctly shows that checksum=true is part of the default settings. However, when an AO table is created without an explicit checksum clause, this default value is not persisted to the table's metadata in the pg_class.reloptions column.

This behavior differs from Greenplum, which correctly persists the checksum=true default to pg_class.reloptions. Since gp_default_storage_options is a GUC that can be changed at any time, it is critical that the effective storage options at creation time are explicitly recorded in the metadata. The current behavior can lead to a misunderstanding of the table's actual storage properties if the GUC is changed later.

Reproduction and Evidence

The following raw psql session logs demonstrate the issue. The session on Cloudberry shows that checksum=true is the configured default but is not persisted to pg_class.reloptions. The session on Greenplum shows the expected, consistent behavior.

cbdb@robertmu-VirtualBox:~/Projects/cloudberry$ psql
psql (14.4, server 14.4)
Type "help" for help.

cbdb=# select version();
                                                                                                version
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 14.4 (Apache Cloudberry 2.1.0-devel+dev.2019.g1cc76495e18 build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit compiled on Jul 18 2025 11:29:40
(1 row)

cbdb=#
cbdb=# create table tab_ao(a int, b int) with(appendonly=true, orientation=column, compresstype=zlib, blocksize=32768, compresslevel=1);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Apache Cloudberry data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
cbdb=#
cbdb=# select oid, relname, reloptions from pg_class where relname = 'tab_ao';
  oid  | relname |                     reloptions
-------+---------+-----------------------------------------------------
 20763 | tab_ao  | {compresstype=zlib,blocksize=32768,compresslevel=1}
(1 row)

cbdb=#
cbdb=# show gp_default_storage_options;
           gp_default_storage_options
-------------------------------------------------
 blocksize=32768,compresstype=none,checksum=true
(1 row)

cbdb=#
cbdb=#

gpdb7@robertmu-VirtualBox:~/Projects/gpdb-archive$ psql
psql (12.12)
Type "help" for help.

gpdb7=# select version();
                                                                                                  version
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.12 (Greenplum Database 7.0.0-beta.0+482967c1b4 build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit compiled on Nov  8 2024 23:43:47 Bhuvnesh C.
(1 row)

gpdb7=#
gpdb7=# create table tab_ao(a int, b int) with(appendonly=true, orientation=column, compresstype=zlib, blocksize=32768, compresslevel=1);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
gpdb7=#
gpdb7=# select oid, relname, reloptions from pg_class where relname = 'tab_ao';
  oid  | relname |                            reloptions
-------+---------+-------------------------------------------------------------------
 18293 | tab_ao  | {compresstype=zlib,blocksize=32768,compresslevel=1,checksum=true}
(1 row)

gpdb7=#
gpdb7=# show gp_default_storage_options;
           gp_default_storage_options
-------------------------------------------------
 blocksize=32768,compresstype=none,checksum=true
(1 row)

gpdb7=#
gpdb7=#

Expected Behavior

The reloptions column in pg_class for the tab_ao table should contain checksum=true, as this is the effective default set by gp_default_storage_options at the time of creation.

Actual Behavior

The reloptions column in pg_class for the tab_ao table does not contain the checksum=true option, even though it is part of the system default.

Environment

  • Cloudberry Version: PostgreSQL 14.4 (Apache Cloudberry 2.1.0-devel+dev.2019.g1cc76495e18 build dev)
  • Greenplum Version: PostgreSQL 12.12 (Greenplum Database 7.0.0-beta.0+482967c1b4 build dev)

robertmu avatar Jul 23 '25 09:07 robertmu