postgres_scanner
postgres_scanner copied to clipboard
Unwanted rows modified when using UPDATE on partitioned table : CTID are not unique in partitioned tables
What happens?
When Duckdb updates a Postgres partitioned table, it will also update rows that are not wanted. After a bit of digging, i found that currently, ctid are used by Duckdb to target the rows to update.
https://github.com/duckdb/postgres_scanner/blob/03eaed75f0ec5500609b7a97aa05468493b229d1/src/storage/postgres_update.cpp#L49-L70
For standard table it works perfectly but not for partionned tables. Unfortunately, ctid are not unique for partitioned table. Instead, ctid are unique for each table that compose the partitioned table.
Here is my uneducated guess : This issue could be solved if the partionned keys are added to the final update statement. It might be a bit tricky with multi level partition but doable.
To Reproduce
The postgresql partitioned table :
CREATE SCHEMA IF NOT EXISTS ctid_duplicate;
CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table (
id integer,
category text,
data text,
PRIMARY KEY (id, category)
) PARTITION BY LIST (category);
CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_a
PARTITION OF ctid_duplicate.my_table
FOR VALUES IN ('A');
CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_b
PARTITION OF ctid_duplicate.my_table
FOR VALUES IN ('B');
CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_c
PARTITION OF ctid_duplicate.my_table
FOR VALUES IN ('C');
INSERT INTO ctid_duplicate.my_table (id, category, data)
VALUES
(1, 'A', 'Data for category A'),
(2, 'B', 'Data for category B'),
(3, 'C', 'Data for category C'),
(4, 'A', 'Another data for category A');
A simple simple select let us see that ctid aren't unique
SELECT ctid, id, category, "data"
FROM ctid_duplicate.my_table;
Now to actually test the issue with DuckDB :
ATTACH if not exists 'postgresql://sco:sco@localhost:5432/rece' as postgres_rece (type POSTGRES);
update postgres_rece.ctid_duplicate.my_table
set data = 'CTID are duplicated'
where category = 'A';
Let's check again the table, only 2 rows should have been updated :
select * from postgres_rece.ctid_duplicate.my_table;
The results :
OS:
ubuntu 22.04
PostgreSQL Version:
15.8
DuckDB Version:
1.2.1
DuckDB Client:
Python
Full Name:
Simon Cotte
Affiliation:
AtolCD
Have you tried this on the latest main branch?
- [X] I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- [X] I agree