partition_data_*() not work when table PK defined with GENERATED ALWAYS
I use pgmq with pg_parten, got error when calling partition_data_time()
cannot insert a non-DEFAULT value into column "msg_id"
DETAIL: Column "msg_id" is an identity column defined as GENERATED ALWAYS.
HINT: Use OVERRIDING SYSTEM VALUE to override.
CONTEXT: SQL statement "WITH partition_data AS (
DELETE FROM partman_temp_data_storage RETURNING *)
INSERT INTO pgmq.q_abc0ba_p20241221_153200 (msg_id,read_ct,enqueued_at,vt,message,headers) SELECT msg_id,read_ct,enqueued_at,vt,message,headers FROM partition_data"
PL/pgSQL function partition_data_time(text,integer,interval,numeric,text,boolean,text,text[]) line 245 at EXECUTE
Cause Column "msg_id" is an identity column defined as GENERATED ALWAYS:
CREATE TABLE "pgmq"."Untitled" (
"msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY , -- <----- here
"read_ct" int4 NOT NULL DEFAULT 0,
"enqueued_at" timestamptz(6) NOT NULL DEFAULT now(),
"vt" timestamptz(6) NOT NULL,
"message" jsonb,
"headers" jsonb
)
PARTITION BY RANGE (
"enqueued_at" "pg_catalog"."timestamptz_ops"
)
;
I committed a PR to handle this situation : https://github.com/pgpartman/pg_partman/pull/723
So I'm not able to re-produced this issue when using the p_ignored_columns option. Reading the description of OVERRIDING SYSTEM VALUE, you should only be running into that error when you're actually trying to insert values into that column. The p_ignored_columns option should completely remove that column from any insert statements so that override shouldn't be necessary.
Or are you trying to intentionally override those values? If so, that's a completely different issue and properly handling that would be different than the PR you provided. Your PR makes it so it always overrides no matter what if there's an identity. My assumption when someone has an identity set to GENERATE ALWAYS for any table (not just partitioned) is that no user should be inserting any values into that table without explicitly making that clear. So I would not want that to be the default in partman.
Below is a similar example to what you've given (I just make it all lower case and only set the NOT NULL columns to be the relevant ones to make an easier test case). It works fine ignoring the column and allows the partitioned table to generate the ID columns automatically.
If you can provide me a working example of where you're running into an error, maybe we can figure out a better solution to allow users to override identities if desired.
keith@github724=# CREATE TABLE "pgmq"."untitled" (
"msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY , -- <----- here
"read_ct" int4 DEFAULT 0,
"enqueued_at" timestamptz(6) NOT NULL DEFAULT now(),
"vt" timestamptz(6),
"message" jsonb,
"headers" jsonb
)
PARTITION BY RANGE (
"enqueued_at" "pg_catalog"."timestamptz_ops"
)
;
CREATE TABLE
Time: 6.201 ms
keith@github724=# create table pgmq.source_untitled (like pgmq.untitled);
CREATE TABLE
Time: 14.238 ms
keith@github724=# alter table pgmq.source_untitled alter msg_id add generated always as identity;
ALTER TABLE
Time: 14.269 ms
keith@github724=# select partman.create_parent('pgmq.untitled', 'enqueued_at', '1 day');
create_parent
---------------
t
(1 row)
Time: 35.883 ms
keith@github724=# \d+ pgmq.untitled
Partitioned table "pgmq.untitled"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+-----------------------------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
msg_id | bigint | | not null | generated always as identity | plain | | |
read_ct | integer | | | 0 | plain | | |
enqueued_at | timestamp(6) with time zone | | not null | now() | plain | | |
vt | timestamp(6) with time zone | | | | plain | | |
message | jsonb | | | | extended | | |
headers | jsonb | | | | extended | | |
Partition key: RANGE (enqueued_at)
Partitions: pgmq.untitled_p20241226 FOR VALUES FROM ('2024-12-26 00:00:00-05') TO ('2024-12-27 00:00:00-05'),
pgmq.untitled_p20241227 FOR VALUES FROM ('2024-12-27 00:00:00-05') TO ('2024-12-28 00:00:00-05'),
pgmq.untitled_p20241228 FOR VALUES FROM ('2024-12-28 00:00:00-05') TO ('2024-12-29 00:00:00-05'),
pgmq.untitled_p20241229 FOR VALUES FROM ('2024-12-29 00:00:00-05') TO ('2024-12-30 00:00:00-05'),
pgmq.untitled_p20241230 FOR VALUES FROM ('2024-12-30 00:00:00-05') TO ('2024-12-31 00:00:00-05'),
pgmq.untitled_p20241231 FOR VALUES FROM ('2024-12-31 00:00:00-05') TO ('2025-01-01 00:00:00-05'),
pgmq.untitled_p20250101 FOR VALUES FROM ('2025-01-01 00:00:00-05') TO ('2025-01-02 00:00:00-05'),
pgmq.untitled_p20250102 FOR VALUES FROM ('2025-01-02 00:00:00-05') TO ('2025-01-03 00:00:00-05'),
pgmq.untitled_p20250103 FOR VALUES FROM ('2025-01-03 00:00:00-05') TO ('2025-01-04 00:00:00-05'),
pgmq.untitled_default DEFAULT
keith@github724=# insert into pgmq.source_untitled (enqueued_at) values (generate_series('2024-12-26'::timestamptz, '2025-01-02'::timestamptz, '1 hour'::interval));
INSERT 0 169
Time: 18.768 ms
keith@github724=# call partman.partition_data_proc('pgmq.untitled', p_source_table := 'pgmq.source_untitled', p_ignored_columns := ARRAY['msg_id']);
NOTICE: Loop: 1, Rows moved: 24
NOTICE: Loop: 2, Rows moved: 24
NOTICE: Loop: 3, Rows moved: 24
NOTICE: Loop: 4, Rows moved: 24
NOTICE: Loop: 5, Rows moved: 24
NOTICE: Loop: 6, Rows moved: 24
NOTICE: Loop: 7, Rows moved: 24
NOTICE: Loop: 8, Rows moved: 1
NOTICE: Total rows moved: 169
NOTICE: Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data
CALL
Time: 8118.288 ms (00:08.118)
keith@github724=# select * from pgmq.untitled;
msg_id | read_ct | enqueued_at | vt | message | headers
--------+---------+------------------------+--------+---------+---------
1 | «NULL» | 2024-12-26 00:00:00-05 | «NULL» | «NULL» | «NULL»
2 | «NULL» | 2024-12-26 01:00:00-05 | «NULL» | «NULL» | «NULL»
3 | «NULL» | 2024-12-26 02:00:00-05 | «NULL» | «NULL» | «NULL»
4 | «NULL» | 2024-12-26 03:00:00-05 | «NULL» | «NULL» | «NULL»
5 | «NULL» | 2024-12-26 04:00:00-05 | «NULL» | «NULL» | «NULL»
6 | «NULL» | 2024-12-26 05:00:00-05 | «NULL» | «NULL» | «NULL»
7 | «NULL» | 2024-12-26 06:00:00-05 | «NULL» | «NULL» | «NULL»
8 | «NULL» | 2024-12-26 07:00:00-05 | «NULL» | «NULL» | «NULL»
9 | «NULL» | 2024-12-26 08:00:00-05 | «NULL» | «NULL» | «NULL»
10 | «NULL» | 2024-12-26 09:00:00-05 | «NULL» | «NULL» | «NULL»
[...]
Created some new unit tests for version 5.2.4 to test the bug I fixed there. I also had those test use GENERATED IDENTITY columns, one even using it for the partition key, and all seems to be working fine as long as I set p_ignored_columns.
https://github.com/pgpartman/pg_partman/pull/727
So just let me know if you're still having issues even when using the ignored columns options
Actually, I think I just found the issue. I was testing this in PG17, but apparently the ability for partitioned tables to properly honor IDENTITY was not added until 17!
https://www.postgresql.org/docs/17/release-17.html
Allow partitioned tables to have identity columns (Ashutosh Bapat)
I got this error when I tested in PG16.
keith@keith=# SELECT partman.partition_data_id('partman_test.id_taptest_table', '20', p_source_table := 'partman_source.id_taptest_table_source', p_ignored_columns := ARRAY['col1']);
ERROR: null value in column "col1" of relation "id_taptest_table_p0" violates not-null constraint
DETAIL: Failing row contains (null, stuff, 2024-12-31 17:49:03.486479-05, stuff3000000001).
CONTEXT: SQL statement "WITH partition_data AS (
DELETE FROM ONLY partman_source.id_taptest_table_source WHERE col1 >= 0 AND col1 < 10 RETURNING *)
INSERT INTO partman_test.id_taptest_table_p0 (col2,col3,col4) SELECT col2,col3,col4 FROM partition_data"
PL/pgSQL function partman.partition_data_id(text,integer,bigint,numeric,text,boolean,text,text[]) line 215 at EXECUTE
Time: 15.045 ms
So, this is actually an issue that has been fixed properly in core as of PG17. I don't think I want to allow an override option to get around this for now. The recommended fix would be to upgrade to PostgreSQL 17.
I have re producted on quay.io/tembo/pg17-pgmq:v1.5.0 ( pg-partman version 5.1.0)
CREATE TABLE "tb1" (
"msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY ,
"read_ct" int4 DEFAULT 0,
"enqueued_at" timestamptz(6) NOT NULL DEFAULT now()
)
PARTITION BY RANGE (
"enqueued_at" "pg_catalog"."timestamptz_ops"
);
select create_parent('public.tb1', 'enqueued_at', '1 day');
postgres=# \d+ public.tb1;
Partitioned table "public.tb1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+-----------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
msg_id | bigint | | not null | generated always as identity | plain | | |
read_ct | integer | | | 0 | plain | | |
enqueued_at | timestamp(6) with time zone | | not null | now() | plain | | |
Partition key: RANGE (enqueued_at)
Partitions: tb1_p20250102 FOR VALUES FROM ('2025-01-02 00:00:00+00') TO ('2025-01-03 00:00:00+00'),
tb1_p20250103 FOR VALUES FROM ('2025-01-03 00:00:00+00') TO ('2025-01-04 00:00:00+00'),
tb1_p20250104 FOR VALUES FROM ('2025-01-04 00:00:00+00') TO ('2025-01-05 00:00:00+00'),
tb1_p20250105 FOR VALUES FROM ('2025-01-05 00:00:00+00') TO ('2025-01-06 00:00:00+00'),
tb1_p20250106 FOR VALUES FROM ('2025-01-06 00:00:00+00') TO ('2025-01-07 00:00:00+00'),
tb1_p20250107 FOR VALUES FROM ('2025-01-07 00:00:00+00') TO ('2025-01-08 00:00:00+00'),
tb1_p20250108 FOR VALUES FROM ('2025-01-08 00:00:00+00') TO ('2025-01-09 00:00:00+00'),
tb1_p20250109 FOR VALUES FROM ('2025-01-09 00:00:00+00') TO ('2025-01-10 00:00:00+00'),
tb1_p20250110 FOR VALUES FROM ('2025-01-10 00:00:00+00') TO ('2025-01-11 00:00:00+00'),
tb1_default DEFAULT
Insert a row into default part
insert into public.tb1_default (enqueued_at) values ('2025-01-15'::timestamptz);
postgres=# select * from tb1_default;
msg_id | read_ct | enqueued_at
--------+---------+------------------------
1 | 0 | 2025-01-15 00:00:00+00
(1 row)
Merge data:
postgres=# select partition_data_time('public.tb1');
ERROR: cannot insert a non-DEFAULT value into column "msg_id"
DETAIL: Column "msg_id" is an identity column defined as GENERATED ALWAYS.
HINT: Use OVERRIDING SYSTEM VALUE to override.
CONTEXT: SQL statement "WITH partition_data AS (
DELETE FROM partman_temp_data_storage RETURNING *)
INSERT INTO public.tb1_p20250115 (msg_id,read_ct,enqueued_at) SELECT msg_id,read_ct,enqueued_at FROM partition_data"
PL/pgSQL function partition_data_time(text,integer,interval,numeric,text,boolean,text,text[]) line 245 at EXECUTE
using p_ignored_columns
postgres=# select partition_data_time('public.tb1', 1, null, 0, 'ASC', true, NULL, ARRAY['msg_id']);
ERROR: null value in column "msg_id" of relation "partman_temp_data_storage" violates not-null constraint
DETAIL: Failing row contains (null, 0, 2025-01-15 00:00:00+00).
CONTEXT: SQL statement "WITH partition_data AS (
DELETE FROM public.tb1_default WHERE enqueued_at >= '2025-01-15 00:00:00+00' AND enqueued_at < '2025-01-16 00:00:00+00' RETURNING *)
INSERT INTO partman_temp_data_storage (read_ct,enqueued_at) SELECT read_ct,enqueued_at FROM partition_data"
PL/pgSQL function partition_data_time(text,integer,interval,numeric,text,boolean,text,text[]) line 232 at EXECUTE
Ahh I see the issue. I did not test this scenario when dealing with data in the default. The issue is the temp table that's created to hold the data while moving it out of the default then into the proper child (note the table name partman_temp_data_storage).
I'll try and work out a fix for this. In the mean time if you did need to handle this scenario, you could move the data out of the default yourself into another similar table (temporary or not), adjust the premake value so that it creates the necessary future partition in maintenance, then insert that data back in. If the data is in the past, that's a little trickier. You'd have to feed the proper values to the internal create_partition_time() function to have it make the child table you need.
So I'm going to try and see if I can allow an option for OVERRIDING SYSTEM VALUES. Now that I see where this issue is, I could foresee a scenario where someone may want to preserve the generated value that went into the default or even from an original source table. But others may just want a newly generated value when the data is moved. The latter will be the default action that a user will have to intentionally override.
So, may be a bit before I get this fix in. Thanks again for reporting and helping me narrow down the problem.
So there's now a workaround for this issue in 5.3.0. I've posted a release candidate if you'd like to test it out
https://github.com/pgpartman/pg_partman/releases/tag/v5.3.0RC1
5.3.0 has been released with this fix