prql
prql copied to clipboard
Query optimization: join prunning and "uniqeness"
Under certain conditions, join transforms could be removed from the query without changing its result. This would have significantly improve query performance.
Example:
from e = employees
join side:left addresses (e.id == that.employee_id)
select {e.name}
... could be simplified into:
from e = employees
select {e.name}
Conditions:
- when there is a join of left relation with right relation,
- and it is a left join (so no left rows are dropped),
- and right relation is unique on the join condition (so no left rows are duplicated),
- and none of the columns from the right relation are used in the output.
At the moment, PRQL does contain knowledge to detect the conditions 1.-3., but not 4., so this is not possible to implement.
It would be possible to have this "uniqueness" information either:
- declared at relation definition site (in SQL terms, this would be a UNIQUE constraint / PRIMARY KEY),
- inferred from the query (if there is a
group x (take 1)
, we can infer uniqueness onx
). Then, it would be possible to detect condition 4.