Convert IN subquery into UNIQUE INNER JOIN
Currently we are able to convert IN into a semi-join:
select * from t1 where t1.fld1 in (select fld2 from t2);
is executed like:
select * from t1 semi join t2 on t1.fld1 = t2.fld2;
The problem here is that if table t1 is large and the IN predicate is selective (often it contains PK values), then the whole table t1 is read. Instead, it makes more sense to use the subquery as a pre-filter with a plan like this:
select * from t2 lateral join (select * from t1 where t1.fld1 = t2.fld2);
or simply INNER JOIN with the join order to be chosen by the optimizer.
If the subquery returns a non-unique field, a DISTINCT should be applied implicitly, but it's still likely to be cheaper (for a small t2) than a t1 fullscan.
The major problem here is to estimate the subquery cardinality before joining, currently we have no such ability. Without that it's impossible to decide whether DISTINCT + INNER JOIN is cheaper than SEMI JOIN.
Hi
it is rather exception that in query:
select * from t1 where t1.fld1 in (select fld2 from t2);
t2 will be bigger then t1. So join order and cardinality is "deterministic" ;-)
Hi
it is rather exception that in query:
select * from t1 where t1.fld1 in (select fld2 from t2); t2 will be bigger then t1. So join order and cardinality is "deterministic" ;-)
Consider the problem more broadly. t1 and t2 are not necessarily tables. They could be views or table expressions.
They could be views or table expressions.
This does not change the assumption that t2 is smaller than t1 ;-)