Foreign keys between reference tables have locking bugs
I ran the isolation_ref2ref_foreign_keys test 500 times on my dev machine and 7 failures happened. 5 of these failures were because a query that was supposed to be blocked wasn't actually blocked. 2 others were because less or more locks were held than expected when looking at the pg_locks view.
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:12:26.459528512 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:12:26.489528512 +0200
@@ -199,21 +199,21 @@
step s1-view-locks:
SELECT mode, count(*)
FROM pg_locks
WHERE locktype='advisory'
GROUP BY mode
ORDER BY 1, 2;
mode |count
-------------+-----
-ExclusiveLock| 3
+ExclusiveLock| 1
ShareLock | 1
(2 rows)
step s1-rollback:
ROLLBACK;
step s2-rollback:
ROLLBACK;
step s1-view-locks:
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:14:24.299528295 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:14:24.309528295 +0200
@@ -796,25 +796,24 @@
step s1-update-table-3:
UPDATE ref_table_3 SET id = 2 WHERE value = 1 RETURNING id;
id
--
2
(1 row)
step s2-insert-table-2:
INSERT INTO ref_table_2 VALUES (7, 5);
- <waiting ...>
+
step s1-commit:
COMMIT;
-step s2-insert-table-2: <... completed>
step s2-commit:
COMMIT;
step s1-select-table-3:
SELECT * FROM ref_table_3 ORDER BY id, value;
id|value
--+-----
2| 1
3| 3
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:15:26.809528180 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:15:26.829528180 +0200
@@ -461,26 +461,25 @@
BEGIN;
step s2-begin:
BEGIN;
step s2-delete-table-1:
DELETE FROM ref_table_1 WHERE id = 1;
step s1-insert-table-2:
INSERT INTO ref_table_2 VALUES (7, 2);
- <waiting ...>
+
+ERROR: insert or update on table "ref_table_2_114918" violates foreign key constraint "ref_table_2_value_fkey_114918"
step s2-commit:
COMMIT;
-step s1-insert-table-2: <... completed>
-ERROR: insert or update on table "ref_table_2_xxxxxxx" violates foreign key constraint "ref_table_2_value_fkey_xxxxxxx"
step s1-commit:
COMMIT;
step s1-select-table-2:
SELECT * FROM ref_table_2 ORDER BY id, value;
id|value
--+-----
3| 3
5| 5
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:18:24.609527853 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:18:24.629527853 +0200
@@ -693,25 +693,24 @@
BEGIN;
step s2-begin:
BEGIN;
step s2-insert-table-1:
INSERT INTO ref_table_1 VALUES (7, 7);
step s1-update-table-2:
UPDATE ref_table_2 SET id = 0 WHERE value = 2;
- <waiting ...>
+
step s2-commit:
COMMIT;
-step s1-update-table-2: <... completed>
step s1-commit:
COMMIT;
step s1-select-table-3:
SELECT * FROM ref_table_3 ORDER BY id, value;
id|value
--+-----
1| 1
3| 3
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:27:26.709526856 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:27:26.739526856 +0200
@@ -368,38 +368,37 @@
BEGIN;
step s2-begin:
BEGIN;
step s2-update-table-1:
UPDATE ref_table_1 SET id = 2 WHERE id = 1;
step s1-insert-table-2:
INSERT INTO ref_table_2 VALUES (7, 2);
- <waiting ...>
+
+ERROR: insert or update on table "ref_table_2_129660" violates foreign key constraint "ref_table_2_value_fkey_129660"
step s2-commit:
COMMIT;
-step s1-insert-table-2: <... completed>
step s1-commit:
COMMIT;
step s1-select-table-2:
SELECT * FROM ref_table_2 ORDER BY id, value;
id|value
--+-----
1| 2
3| 3
5| 5
- 7| 2
-(4 rows)
+(3 rows)
starting permutation: s1-begin s2-begin s2-update-table-1 s1-update-table-2 s2-commit s1-commit s1-select-table-2
step s1-begin:
BEGIN;
step s2-begin:
BEGIN;
step s2-update-table-1:
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:32:25.139526307 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:32:25.149526307 +0200
@@ -662,25 +662,24 @@
step s1-update-table-3:
UPDATE ref_table_3 SET id = 2 WHERE value = 1 RETURNING id;
id
--
2
(1 row)
step s2-insert-table-1:
INSERT INTO ref_table_1 VALUES (7, 7);
- <waiting ...>
+
step s1-commit:
COMMIT;
-step s2-insert-table-1: <... completed>
step s2-commit:
COMMIT;
step s1-select-table-3:
SELECT * FROM ref_table_3 ORDER BY id, value;
id|value
--+-----
2| 1
3| 3
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:42:08.689525233 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 15:42:08.729525233 +0200
@@ -288,21 +288,22 @@
step s1-view-locks:
SELECT mode, count(*)
FROM pg_locks
WHERE locktype='advisory'
GROUP BY mode
ORDER BY 1, 2;
mode |count
------------------------+-----
-(0 rows)
+ShareUpdateExclusiveLock| 1
+(1 row)
starting permutation: s2-begin s2-insert-table-3 s1-view-locks s2-rollback s1-view-locks
step s2-begin:
BEGIN;
step s2-insert-table-3:
INSERT INTO ref_table_3 VALUES (7, 5);
step s1-view-locks:
I found two other missing RowExclusiveLock locks with more information, because of #6197:
diff -dU10 -w /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out
--- /home/jelte/work/citus/src/test/regress/expected/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 16:37:12.359519155 +0200
+++ /home/jelte/work/citus/src/test/regress/results/isolation_ref2ref_foreign_keys.out.modified 2022-08-18 16:37:12.389519155 +0200
@@ -450,25 +450,23 @@
regexp_replace(query, E'[\\n\\r\\u2028]+', ' ', 'g' ) query
FROM pg_locks l
JOIN pg_stat_activity a
ON l.pid = a.pid
WHERE locktype='advisory'
AND application_name <> 'Citus Maintenance Daemon'
ORDER BY 1, 2, 3, 4;
classid| objid|objsubid|mode |application_name |backend_type |query
-------+-------+--------+----------------+-------------------------------------------+--------------+--------------------------------------------
- 0|8429800| 5|RowExclusiveLock|isolation/isolation_ref2ref_foreign_keys/s2|client backend| INSERT INTO ref_table_3 VALUES (7, 5);
- 0|8429801| 5|RowExclusiveLock|isolation/isolation_ref2ref_foreign_keys/s2|client backend| INSERT INTO ref_table_3 VALUES (7, 5);
0|8429802| 4|ShareLock |isolation/isolation_ref2ref_foreign_keys/s2|client backend| INSERT INTO ref_table_3 VALUES (7, 5);
0|8429802| 5|RowExclusiveLock|isolation/isolation_ref2ref_foreign_keys/s2|client backend| INSERT INTO ref_table_3 VALUES (7, 5);
-(4 rows)
+(2 rows)
step s2-rollback:
ROLLBACK;
step s1-view-locks:
SELECT classid,
objid,
objsubid,
mode,
application_name,
It could be this is the same underlying cause as most of the other locking issues.
@onderkalaci suggested that the reason for this is probably that we don't invalidate the foreign key graph correctly, like described in https://github.com/citusdata/citus/issues/4676
Relevant code: https://github.com/citusdata/citus/blob/3bf3d9c19b8631c6848f0c52865d1dad0e8693c4/src/backend/distributed/metadata/metadata_cache.c#L4120-L4148