citus icon indicating copy to clipboard operation
citus copied to clipboard

[BUG]alter_old_partitions_set_access_method deadlock

Open GPF199541 opened this issue 1 year ago • 2 comments

testdb=# CALL alter_old_partitions_set_access_method(
't_test_base_info', '2023-09-01 00:00:00', 'columnar'
); NOTICE: converting t_test_base_info_p2023_08_30 with start time 2023-08-30 00:00:00 and end time 2023-08-31 00:00:00 NOTICE: creating a new table for public.t_test_base_info_p2023_08_30 NOTICE: moving the data of public.t_test_base_info_p2023_08_30 ERROR: canceling the transaction since it was involved in a distributed deadlock CONTEXT: SQL statement "ALTER TABLE IF EXISTS public.t_test_base_info DETACH PARTITION public.t_test_base_info_p2023_08_30;" SQL statement "SELECT pg_catalog.alter_table_set_access_method(r.partition, new_access_method)" PL/pgSQL function alter_old_partitions_set_access_method(regclass,timestamp with time zone,name) line 27 at PERFORM

GPF199541 avatar Sep 04 '23 12:09 GPF199541

https://github.com/citusdata/citus/issues/6609 like this

citus 11.3 postgres 15.2

GPF199541 avatar Sep 04 '23 12:09 GPF199541

I have encountered a very similar problem on a single-node instance of Citus. I attempted to convert a heap storage partition to columnar storage with the alter_table_set_access() command while my application continued to insert data into the same partitioned table concurrently. As far as I can tell, the application was inserting into a different partition than the one I was attempting to convert. Once the operation began, the application was blocked from continuing to insert data into the table. After 10+ minutes of high disk I/O, the result of the operation was the following deadlock error:

ERROR:  deadlock detected
DETAIL:  Process 949404 waits for AccessExclusiveLock on relation 127868 of database 16384; blocked by process 896852.
Process 896582 waits for RowExclusiveLock on relation 151949 of database 16384; blocked by process 949404.
HINT:  See server log for query details.
CONTEXT:  SQL statement "ALTER TABLE IF EXISTS records DETACH PARTITION records_p2023_11_29;"

I shut down the application so that no concurrent insert operations were occurring, and re-attempted the conversion to columnar storage. This time, the operation was successful.

I read the PostgreSQL documentation for the DETACH PARTITION command and it shows that there is a CONCURRENTLY parameter. Could Citus potentially use this parameter to ensure that conversion to columnar storage can happen concurrently with other ongoing database operations?

I am running Citus 12.0.0.

weetster avatar Dec 01 '23 21:12 weetster