prql icon indicating copy to clipboard operation
prql copied to clipboard

append doesn't check column, column order matters

Open cottrell opened this issue 1 year ago • 3 comments

What happened?

I'm not sure if this is intentional or not.

PRQL input

from invoices
group billing_country (
aggregate {
  a = count_distinct billing_city
  }
)
into A

from invoices
aggregate {
a = count_distinct billing_city
}
derive {
billing_country = '*'
}
select {a, billing_country}
into B

from A
append B

SQL output

WITH "A" AS (
  SELECT
    billing_country,
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
  GROUP BY
    billing_country
),
table_0 AS (
  SELECT
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
),
"B" AS (
  SELECT
    a,
    '*' AS billing_country
  FROM
    table_0
)
SELECT
  billing_country,
  a
FROM
  "A"
UNION
ALL
SELECT
  *
FROM
  "B"

-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)

Expected SQL output

No response

MVCE confirmation

  • [ ] Minimal example
  • [ ] New issue

Anything else?

It works if you manually align the columns with a select. Might be intended not sure.

billing_country	a
Germany	3
Norway	1
Belgium	1
Canada	8
USA	12
France	4
Ireland	1
United Kingdom	2
Australia	1
Chile	1
India	2
Brazil	4
Portugal	2
Netherlands	1
Spain	1
Sweden	1
Czech Republic	1
Finland	1
Denmark	1
Italy	1
Poland	1
Austria	1
Hungary	1
Argentina	1
53	*

cottrell avatar Jul 13 '24 10:07 cottrell

Sorry for being slow, but could you explain in a bit more detail what the issue is?

(An expected SQL output would be great)

max-sixty avatar Jul 13 '24 22:07 max-sixty

Sorry for being slow, but could you explain in a bit more detail what the issue is?

(An expected SQL output would be great)

If you look at the last row it is "53 " and it shoudl be " 53". I should have chosen a better text than "*" I guess as it doesn't jump out.

If it's possible, prql should probably inspect the columns and align them but if the column names are no available (select * clause) then it is not possible.

I think the simplest solution would be to simply warn in the docs for append that the user must manually align the tables.

WITH "A" AS (
  SELECT
    billing_country,
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
  GROUP BY
    billing_country
),
table_0 AS (
  SELECT
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
),
"B" AS (
  SELECT
    '*' AS billing_country, -- Ensure this column matches the first query
    a
  FROM
    table_0
)
SELECT
  billing_country,
  a
FROM
  "A"
UNION ALL
SELECT
  billing_country,
  a
FROM
  "B";

cottrell avatar Jul 15 '24 08:07 cottrell

(I know this issue is old...) What database engine were you using? Assuming this was in the playground, DuckDB's docs on UNION specify that:

Traditional set operations unify queries by column position, and require the to-be-combined queries to have the same number of input columns. [...] DuckDB also supports UNION [ALL] BY NAME, which joins columns by name instead of by position. UNION BY NAME does not require the inputs to have the same number of columns. NULL values will be added in case of missing columns.

I wonder if PRQL could support append B by:name?

kgutwin avatar Feb 27 '25 19:02 kgutwin