firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Convert IN subquery into UNIQUE INNER JOIN

Open dyemanov opened this issue 8 months ago • 3 comments

Currently we are able to convert IN into a semi-join:

select * from t1 where t1.fld1 in (select fld2 from t2);

is executed like:

select * from t1 semi join t2 on t1.fld1 = t2.fld2;

The problem here is that if table t1 is large and the IN predicate is selective (often it contains PK values), then the whole table t1 is read. Instead, it makes more sense to use the subquery as a pre-filter with a plan like this:

select * from t2 lateral join (select * from t1 where t1.fld1 = t2.fld2);

or simply INNER JOIN with the join order to be chosen by the optimizer.

If the subquery returns a non-unique field, a DISTINCT should be applied implicitly, but it's still likely to be cheaper (for a small t2) than a t1 fullscan.

The major problem here is to estimate the subquery cardinality before joining, currently we have no such ability. Without that it's impossible to decide whether DISTINCT + INNER JOIN is cheaper than SEMI JOIN.

dyemanov avatar Apr 22 '25 12:04 dyemanov

Hi

it is rather exception that in query:

select * from t1 where t1.fld1 in (select fld2 from t2);

t2 will be bigger then t1. So join order and cardinality is "deterministic" ;-)

livius2 avatar Apr 22 '25 22:04 livius2

Hi

it is rather exception that in query:

select * from t1 where t1.fld1 in (select fld2 from t2); t2 will be bigger then t1. So join order and cardinality is "deterministic" ;-)

Consider the problem more broadly. t1 and t2 are not necessarily tables. They could be views or table expressions.

sim1984 avatar Apr 23 '25 06:04 sim1984

They could be views or table expressions.

This does not change the assumption that t2 is smaller than t1 ;-)

livius2 avatar Apr 23 '25 13:04 livius2