prql icon indicating copy to clipboard operation
prql copied to clipboard

Attempts to sort by column that do not exist after join

Open eitsupi opened this issue 2 years ago • 7 comments

What happened?

Sorry the example is not small enough, but the following example seems to fail to execute the query because it tries to sort using column a._expr_0, which does not actually exist.

PRQL input

from a
join side:left b (==col)
sort a.col
select !{a.col}
take 5

SQL output

WITH table_0 AS (
  SELECT
    a.*,
    b.*,
    a.col AS _expr_0
  FROM
    a
    LEFT JOIN b ON a.col = b.col
  ORDER BY
    a._expr_0
  LIMIT
    5
)
SELECT
  *
FROM
  table_0
ORDER BY
  _expr_0

-- Generated by PRQL compiler version:0.9.2 (https://prql-lang.org)

Expected SQL output

WITH table_0 AS (
  SELECT
    a.*,
    b.*,
    a.col AS _expr_0
  FROM
    a
    LEFT JOIN b ON a.col = b.col
  ORDER BY
    a.col
  LIMIT
    5
)
SELECT
  *
FROM
  table_0
ORDER BY
  _expr_0

-- Generated by PRQL compiler version:0.9.2 (https://prql-lang.org/)

MVCE confirmation

  • [X] Minimal example
  • [X] New issue

Anything else?

No response

eitsupi avatar Jul 27 '23 09:07 eitsupi

Nice catch, the example is small enough!

aljazerzen avatar Jul 27 '23 09:07 aljazerzen

+1 I just ran into this myself

It seems the issue is with the ordering guarantee characteristic of sort:

PRQL pushes the sort down the pipeline, compiling the ORDER BY to the end of the query.

What's more, I would've thought wrapping it in a variable would work but unfortunately no.

Here's another example runnable in the playground:

let a = (
  from invoices
  sort {-invoice_date}
)
let b = (
  from a
  select {customer_id}
)
from b
WITH a AS (
  SELECT
    *
  FROM
    invoices
),
b AS (
  SELECT
    customer_id
  FROM
    a
)
SELECT
  customer_id
FROM
  b
ORDER BY
  invoice_date DESC

-- Generated by PRQL compiler version:0.9.4 (https://prql-lang.org)

syko avatar Sep 14 '23 12:09 syko

A smaller example, if it helps:

let x = (
  from table
  sort index
  select {fieldA}
)
from x
WITH table_0 AS (
  SELECT
    "fieldA",
    "index"
  FROM
    "table"
),
x AS (
  SELECT
    "fieldA"
  FROM
    table_0
)
SELECT
  "fieldA"
FROM
  x
ORDER BY
  "index"

-- Generated by PRQL compiler version:0.10.0 (https://prql-lang.org)

fredericp avatar Nov 08 '23 10:11 fredericp

I don't know when, but this seems to be happening again.

eitsupi avatar Jan 28 '24 04:01 eitsupi

I've just verified it works in the playground:

let x = (
  from artists
  sort artist_id
  select {name}
)
from x
WITH table_0 AS (
  SELECT
    name,
    artist_id
  FROM
    artists
),
x AS (
  SELECT
    name,
    artist_id
  FROM
    table_0
)
SELECT
  name
FROM
  x
ORDER BY
  artist_id

-- Generated by PRQL compiler version:0.11.1 (https://prql-lang.org)

What prqlc version were you using?

aljazerzen avatar Jan 29 '24 09:01 aljazerzen

I've just verified it works in the playground:

I saw:

from a
join side:left b (==col)
sort a.col
select !{a.col}
take 5

to

WITH table_0 AS (
  SELECT
    a.*,
    b.*,
    a.col AS _expr_0
  FROM
    a
    LEFT JOIN b ON a.col = b.col
  ORDER BY
    a._expr_0
  LIMIT
    5
)
SELECT
  *
FROM
  table_0
ORDER BY
  _expr_0

-- Generated by PRQL compiler version:0.11.1 (https://prql-lang.org)

It is the same of the original query of this issue.

eitsupi avatar Jan 29 '24 09:01 eitsupi

Oh, I see, it's the naming. Damn, this ORDER-BY-pushdown feature is producing a lot of weird bugs. Maybe I should re-think it from the beginning.

aljazerzen avatar Jan 29 '24 10:01 aljazerzen