prql icon indicating copy to clipboard operation
prql copied to clipboard

Query optimization: join prunning and "uniqeness"

Open aljazerzen opened this issue 9 months ago • 3 comments

Under certain conditions, join transforms could be removed from the query without changing its result. This would have significantly improve query performance.

Example:

from e = employees
join side:left addresses (e.id == that.employee_id)
select {e.name}

... could be simplified into:

from e = employees
select {e.name}

Conditions:

  1. when there is a join of left relation with right relation,
  2. and it is a left join (so no left rows are dropped),
  3. and right relation is unique on the join condition (so no left rows are duplicated),
  4. and none of the columns from the right relation are used in the output.

At the moment, PRQL does contain knowledge to detect the conditions 1.-3., but not 4., so this is not possible to implement.

It would be possible to have this "uniqueness" information either:

  • declared at relation definition site (in SQL terms, this would be a UNIQUE constraint / PRIMARY KEY),
  • inferred from the query (if there is a group x (take 1), we can infer uniqueness on x). Then, it would be possible to detect condition 4.

aljazerzen avatar May 20 '24 14:05 aljazerzen