prql icon indicating copy to clipboard operation
prql copied to clipboard

`select` clause after `append` clause

Open YuxiLing opened this issue 2 years ago • 3 comments

What happened?

Dear developer,

Hope this would be helpful.

prql version: 0.8.1

comments: The select clause affects the translation of append clause. It might be caused by internal optimization in prql compiler.

In the following example, we assume tb1 and tb2 have the same columns. We append tb2 to tb1, and select two new columns. However, inside generated sql query, it selects new columns first. Then the union operation fails.

error message from the database: "each UNION query must have the same number of columns"

PRQL input

from tb1
append tb2
select [new_col1=c1*2,new_col2=c2/c3]

SQL output

WITH table_1 AS (
  SELECT
    c2,
    c3,
    c1
  FROM
    tb1
  UNION
  ALL
  SELECT
    *
  FROM
    tb2
)
SELECT
  c1 * 2 AS new_col1,
  c2 / c3 AS new_col2
FROM
  table_1 AS table_0

Expected SQL output

WITH table_1 AS (
  SELECT
    *
  FROM
    tb1
  UNION
  ALL
  SELECT
    *
  FROM
    tb2
)
SELECT
  c1 * 2 AS new_col1,
  c2 / c3 AS new_col2
FROM
  table_1 AS table_0

MVCE confirmation

  • [X] Minimal example
  • [X] New issue

Anything else?

No response

YuxiLing avatar Jun 01 '23 11:06 YuxiLing

Yup, that's a bug.

When the compiler encounters c1, c2 and c3 it infers that they are from tb1, while is should find both tb1 and tb2.

aljazerzen avatar Jun 01 '23 15:06 aljazerzen

I would expect SQL to be:

WITH table_1 AS (
  SELECT
    c1, c2, c3
  FROM
    tb1
  UNION
  ALL
  SELECT
    c1, c2, c3
  FROM
    tb2
)
SELECT
  c1 * 2 AS new_col1,
  c2 / c3 AS new_col2
FROM
  table_1 AS table_0

aljazerzen avatar Jun 01 '23 15:06 aljazerzen

Rewriting the original input to the new syntax:

from tb1
append tb2
select {new_col1=c1*2,new_col2=c2/c3}

Still produces the wrong output as of 0.13.3:

WITH table_0 AS (
  SELECT
    c2,
    c3,
    c1
  FROM
    tb1
  UNION
  ALL
  SELECT
    *
  FROM
    tb2
)
SELECT
  c1 * 2 AS new_col1,
  c2 / c3 AS new_col2
FROM
  table_0

-- Generated by PRQL compiler version:0.13.3-39-ge393ab4d (https://prql-lang.org)

Importantly, an attempted workaround to explicitly select columns from tb1 and tb2 causes another problem:

from tb1
select {c1, c2, c3}
append (from tb2 | select {c1, c2, c3})
select {new_col1=c1*2,new_col2=c2/c3}
WITH table_0 AS (
  SELECT
    c1,
    c2,
    c3
  FROM
    tb2
),
table_1 AS (
  SELECT
    c2,
    c3,
    c1
  FROM
    tb1
  UNION
  ALL
  SELECT
    *
  FROM
    table_0
)
SELECT
  c1 * 2 AS new_col1,
  c2 / c3 AS new_col2
FROM
  table_1

-- Generated by PRQL compiler version:0.13.3-39-ge393ab4d (https://prql-lang.org)

While the number of columns from tb1 and tb2 are now the same, the SELECT order between the two tables is misaligned, which would cause a surprising mismatch, see #4724 .

kgutwin avatar Feb 27 '25 20:02 kgutwin