Support recursive planning of recurring inner joins when there is a reference to outer query
Currently, Citus can successfully execute queries with recurring left join with a reference to outer query via recursive planning. However, when the same subquery uses an INNER JOIN, recursive planning is not applied, and the query fails during planning.
Q1: Fails
SELECT COUNT(*) FROM dist_1 t1
WHERE NOT EXISTS (
SELECT * FROM ref_1 t3
JOIN dist_1 t2 USING (a)
WHERE t3.a = t1.a
);
ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns
Q2: Works
SELECT COUNT(*) FROM dist_1 t1
WHERE NOT EXISTS (
SELECT * FROM ref_1 t3
LEFT JOIN dist_1 t2 USING (a)
WHERE t3.a = t1.a
);
Investigated the issue with @onurctirtir.
After recursive planning, the second query (Q2 with left join) becomes to Q2' where dist_1 is the only distributed table. In this case, Citus determines that it is safe to push down the resulting query.
Q2': After t2 is recursively planned
SELECT COUNT(*) FROM dist_1 t1
WHERE NOT EXISTS (
SELECT * FROM ref_1 t3
LEFT JOIN intermediate_result_dist1 t2 USING (a)
WHERE t3.a = t1.a
);
In contrast, the first query (Q1) involves two distributed tables (t1 and t2), and there is no explicit equivalence between them. This causes Citus to recursively plan the inner query, which then fails due to its reference to the outer query (t1).
Also note that Q3 does not result in an error due to the equality constraint between distributed tables.
Q3:
SELECT COUNT(*) FROM dist_1 t1
WHERE NOT EXISTS (
SELECT * FROM ref_1 t3
JOIN dist1 t2 USING (a)
WHERE t2.a = t1.a
);
For related discussion see https://github.com/citusdata/citus/pull/6650 and https://github.com/citusdata/citus/issues/6653