Code

import prql_python as prql

prql_query = """
    from employees
    join salaries [emp_id]
    group [emp_id, gender] (
      aggregate [
        avg_salary = average salary
      ]
    )
    join departments [dept_id]
"""

sql = prql.to_sql(prql_query)

Error

In [7]: prql.to_sql(prql_query)
Traceback (most recent call last):

  File "/Users/ajfriend/work/env/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-7-288067862fa2>", line 1, in <module>
    prql.to_sql(prql_query)

  File "<string>", line unknown
SyntaxError: Error { span: Some(Span { start: 166, end: 173 }), reason: Simple("join using a column name must belong to both tables"), help: None }

However, JSON seems to work

On the other hand, things seem to work properly when I run prql.to_json(prql_query)

Jul 09 '22 21:07 ajfriend

Thanks @ajfriend .

That does look like a bug; I'm not sure whether it's a regression or it never worked.

I'm going to remove the final line so we don't have an incorrect example, and change this into a compiler bug

Jul 09 '22 21:07 max-sixty

Python example removed by https://github.com/prql/prql/pull/787

Jul 09 '22 21:07 max-sixty

I don't think this error that's thrown in this case (on the final line of the PRQL) is related to joining at all – it's that any columns other than emp_id, gender and avg_salary get dropped with the grouping + aggregation.

Using dept_id in any way after that operation is an invalid operation.

But, what's buggy here is that the error doesn't go away once we add dept_id to the group clause like this:

-group [emp_id, gender] (
+group [emp_id, dept_id, gender] (

but, re-defining dept_id using a derive prior to the first join does work:

from employees
+derive dept_id = dept_id
join salaries [emp_id]
+group [emp_id, dept_id, gender] (
-group [emp_id, gender] (
  aggregate [
    avg_salary = average salary
  ]
)
join departments [dept_id]

as far as PRQL is concerned, adding dept_id to group isn't actually required, BUT without that, the query would actually fail because the grouping swallows it:

Query 1 ERROR: ERROR:  column "dept_id" specified in USING clause does not exist in left table

Jul 18 '22 22:07 mklopets

(just to note, the example that @mklopets gave above is still an error after #908 )

Aug 06 '22 07:08 max-sixty

Here's a minimal example:

from employees
join salaries [emp_id]  # Works without this line
group [dept_id] (
  aggregate [avg_salary = average salary]
)
select [dept_id]

Error: 
   ╭─[:6:9]
   │
 6 │ select [dept_id]
   ·         ───┬───  
   ·            ╰───── Unknown variable `dept_id`
───╯

Oct 13 '22 23:10 max-sixty

This now works:

from employees
join salaries [~emp_id]  # Works without this line
group [dept_id] (
  aggregate [avg_salary = average salary]
)
select [dept_id]

SELECT
  dept_id
FROM
  employees
  JOIN salaries ON employees.emp_id = salaries.emp_id
GROUP BY
  dept_id

Will be released with 0.3

Nov 28 '22 10:11 aljazerzen

prql
prql copied to clipboard

Bug in how we handle joins after aggregates

Code

Error

However, JSON seems to work

prql prql copied to clipboard

Bug in how we handle joins after aggregates

Code

Error

However, JSON seems to work

prql
prql copied to clipboard