prql icon indicating copy to clipboard operation
prql copied to clipboard

Ordering guarantee can cause duplicate columns

Open syko opened this issue 8 months ago • 0 comments

What happened?

Sorting within a variable definition, then selecting the sorted columns will cause the columns to be selected twice.

I guess this is caused by the solution to #3129.

The example below is the simple case but what makes things more complicated is that in my real-world scenario the sorting was for row deduplication group {id} (sort {-created_by} take 1). PRQL includes both id and created_by columns in the final sorting but this was followed by a another CTE that was selecting / deriving a new value for the id column so it would be false to assume that the final sort order is guaranteed if we just ensure the columns are selected once and we do a final sort by columns that match the initially sorted columns by name.

My current workaround is to reset the sort via sort {} after the group.

PRQL input

prql target:sql.postgres

let a = (
  from sometable
  sort {foo}
  select {
    foo
  }
)
let b = (
  from a
)
from b

SQL output

WITH a AS (
  SELECT
    foo
  FROM
    sometable
),
b AS (
  SELECT
    foo,
    foo
  FROM
    a
)
SELECT
  foo -- SQL ERROR: column reference "foo" is ambiguous
FROM
  b
ORDER BY
  foo

-- Generated by PRQL compiler version:0.10.1 (https://prql-lang.org)

Expected SQL output

WITH a AS (
  SELECT
    foo
  FROM
    sometable
),
b AS (
  SELECT
    foo
  FROM
    a
)
SELECT
  foo
FROM
  b
ORDER BY
  foo

-- Generated by PRQL compiler version:0.10.1 (https://prql-lang.org)

MVCE confirmation

  • [X] Minimal example
  • [X] New issue

Anything else?

No response

syko avatar Nov 27 '23 14:11 syko