postgres_scanner icon indicating copy to clipboard operation
postgres_scanner copied to clipboard

Unwanted rows modified when using UPDATE on partitioned table : CTID are not unique in partitioned tables

Open sco-atolcd opened this issue 11 months ago • 1 comments

What happens?

When Duckdb updates a Postgres partitioned table, it will also update rows that are not wanted. After a bit of digging, i found that currently, ctid are used by Duckdb to target the rows to update.

https://github.com/duckdb/postgres_scanner/blob/03eaed75f0ec5500609b7a97aa05468493b229d1/src/storage/postgres_update.cpp#L49-L70

For standard table it works perfectly but not for partionned tables. Unfortunately, ctid are not unique for partitioned table. Instead, ctid are unique for each table that compose the partitioned table.

Here is my uneducated guess : This issue could be solved if the partionned keys are added to the final update statement. It might be a bit tricky with multi level partition but doable.

To Reproduce

The postgresql partitioned table :

CREATE SCHEMA IF NOT EXISTS ctid_duplicate;

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table (
    id integer,
    category text,
    data text,
    PRIMARY KEY (id, category)
) PARTITION BY LIST (category);

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_a
    PARTITION OF ctid_duplicate.my_table
    FOR VALUES IN ('A');

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_b
    PARTITION OF ctid_duplicate.my_table
    FOR VALUES IN ('B');

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_c
    PARTITION OF ctid_duplicate.my_table
    FOR VALUES IN ('C');

INSERT INTO ctid_duplicate.my_table (id, category, data)
VALUES
    (1, 'A', 'Data for category A'),
    (2, 'B', 'Data for category B'),
    (3, 'C', 'Data for category C'),
    (4, 'A', 'Another data for category A');

A simple simple select let us see that ctid aren't unique

SELECT ctid, id, category, "data"
FROM ctid_duplicate.my_table;

image

Now to actually test the issue with DuckDB :

ATTACH if not exists 'postgresql://sco:sco@localhost:5432/rece' as postgres_rece (type POSTGRES);

update postgres_rece.ctid_duplicate.my_table 
set data = 'CTID are duplicated'
where category = 'A';

Let's check again the table, only 2 rows should have been updated :

select * from postgres_rece.ctid_duplicate.my_table;

The results : image

OS:

ubuntu 22.04

PostgreSQL Version:

15.8

DuckDB Version:

1.2.1

DuckDB Client:

Python

Full Name:

Simon Cotte

Affiliation:

AtolCD

Have you tried this on the latest main branch?

  • [X] I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • [X] I agree

sco-atolcd avatar Nov 12 '24 17:11 sco-atolcd