`select` clause after `append` clause
What happened?
Dear developer,
Hope this would be helpful.
prql version: 0.8.1
comments: The select clause affects the translation of append clause. It might be caused by internal optimization in prql compiler.
In the following example, we assume tb1 and tb2 have the same columns. We append tb2 to tb1, and select two new columns. However, inside generated sql query, it selects new columns first. Then the union operation fails.
error message from the database: "each UNION query must have the same number of columns"
PRQL input
from tb1
append tb2
select [new_col1=c1*2,new_col2=c2/c3]
SQL output
WITH table_1 AS (
SELECT
c2,
c3,
c1
FROM
tb1
UNION
ALL
SELECT
*
FROM
tb2
)
SELECT
c1 * 2 AS new_col1,
c2 / c3 AS new_col2
FROM
table_1 AS table_0
Expected SQL output
WITH table_1 AS (
SELECT
*
FROM
tb1
UNION
ALL
SELECT
*
FROM
tb2
)
SELECT
c1 * 2 AS new_col1,
c2 / c3 AS new_col2
FROM
table_1 AS table_0
MVCE confirmation
- [X] Minimal example
- [X] New issue
Anything else?
No response
Yup, that's a bug.
When the compiler encounters c1, c2 and c3 it infers that they are from tb1, while is should find both tb1 and tb2.
I would expect SQL to be:
WITH table_1 AS (
SELECT
c1, c2, c3
FROM
tb1
UNION
ALL
SELECT
c1, c2, c3
FROM
tb2
)
SELECT
c1 * 2 AS new_col1,
c2 / c3 AS new_col2
FROM
table_1 AS table_0
Rewriting the original input to the new syntax:
from tb1
append tb2
select {new_col1=c1*2,new_col2=c2/c3}
Still produces the wrong output as of 0.13.3:
WITH table_0 AS (
SELECT
c2,
c3,
c1
FROM
tb1
UNION
ALL
SELECT
*
FROM
tb2
)
SELECT
c1 * 2 AS new_col1,
c2 / c3 AS new_col2
FROM
table_0
-- Generated by PRQL compiler version:0.13.3-39-ge393ab4d (https://prql-lang.org)
Importantly, an attempted workaround to explicitly select columns from tb1 and tb2 causes another problem:
from tb1
select {c1, c2, c3}
append (from tb2 | select {c1, c2, c3})
select {new_col1=c1*2,new_col2=c2/c3}
WITH table_0 AS (
SELECT
c1,
c2,
c3
FROM
tb2
),
table_1 AS (
SELECT
c2,
c3,
c1
FROM
tb1
UNION
ALL
SELECT
*
FROM
table_0
)
SELECT
c1 * 2 AS new_col1,
c2 / c3 AS new_col2
FROM
table_1
-- Generated by PRQL compiler version:0.13.3-39-ge393ab4d (https://prql-lang.org)
While the number of columns from tb1 and tb2 are now the same, the SELECT order between the two tables is misaligned, which would cause a surprising mismatch, see #4724 .