prql icon indicating copy to clipboard operation
prql copied to clipboard

Compiler outputs `SELECT DISTINCT` even when not grouping by all columns

Open vishnumenon opened this issue 2 years ago • 0 comments

As described in the PRQL Language Book, expected behavior is for

from employees
select department
group department (
  take 1
)

to compile to

SELECT
  DISTINCT department
FROM
  employees

This functions as expected. However,

from employees
group department (
  take 1
)

currently also produces output that uses SELECT DISTINCT, i.e.

SELECT
  DISTINCT employees.*
FROM
  employees

However, the expected output is something like:

WITH table_0 AS (
  SELECT my_table.*, ROW_NUMBER() OVER (PARTITION BY x) AS _rn_81 FROM my_table)
SELECT table_0.* FROM table_0 WHERE _rn_81 <= 1

More generally, pipelines that include group x (take 1) seem to produce output with SELECT DISTINCT even when x is not the only selected column, which is incorrect behavior.

The source of the issue was identified by @aljazerzen as being located here: https://github.com/prql/prql/blob/b754c0a65bb8ab619a9001d00b9b451dbaa3d02d/prql-compiler/src/sql/distinct.rs#L36

vishnumenon avatar Aug 31 '22 04:08 vishnumenon