prql
prql copied to clipboard
Compiler outputs `SELECT DISTINCT` even when not grouping by all columns
As described in the PRQL Language Book, expected behavior is for
from employees
select department
group department (
take 1
)
to compile to
SELECT
DISTINCT department
FROM
employees
This functions as expected. However,
from employees
group department (
take 1
)
currently also produces output that uses SELECT DISTINCT
, i.e.
SELECT
DISTINCT employees.*
FROM
employees
However, the expected output is something like:
WITH table_0 AS (
SELECT my_table.*, ROW_NUMBER() OVER (PARTITION BY x) AS _rn_81 FROM my_table)
SELECT table_0.* FROM table_0 WHERE _rn_81 <= 1
More generally, pipelines that include group x (take 1)
seem to produce output with SELECT DISTINCT
even when x
is not the only selected column, which is incorrect behavior.
The source of the issue was identified by @aljazerzen as being located here: https://github.com/prql/prql/blob/b754c0a65bb8ab619a9001d00b9b451dbaa3d02d/prql-compiler/src/sql/distinct.rs#L36