zed
zed copied to clipboard
SQL: UNION DISTINCT
tl;dr
We've reached consensus that the initial GA release of SuperDB will include the UNION ALL support added in #5735, while UNION DISTINCT will come later.
Details
At the time this issue is being filed, super is at commit a6eae50.
This is illustrated using an abbreviated example of something from the w3schools SQL tutorial.
Given input data files:
$ cat customers.csv
City,Country
Berlin,Germany
México D.F.,Mexico
México D.F.,Mexico
$ cat suppliers.csv
City,Country
Londona,UK
New Orleans,USA
Here's UNION ALL behaving as expected.
$ super -version
Version: v1.18.0-355-ga6eae509
$ super -c "
SELECT City FROM customers.csv
UNION ALL
SELECT City FROM suppliers.csv
ORDER BY City;"
{City:"Berlin"}
{City:"México D.F."}
{City:"México D.F."}
{City:"Londona"}
{City:"New Orleans"}
However, plain UNION (aka the more verbose UNION DISTINCT) is not yet supported.
$ super -c "
SELECT City FROM customers.csv
UNION
SELECT City FROM suppliers.csv
ORDER BY City;"
UNION DISTINCT not currently supported at line 2, column 1:
SELECT City FROM customers.csv
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Whereas a system like DuckDB outputs:
$ duckdb -c "
SELECT City FROM customers.csv
UNION
SELECT City FROM suppliers.csv
ORDER BY City;"
┌─────────────┐
│ City │
│ varchar │
├─────────────┤
│ Berlin │
│ Londona │
│ México D.F. │
│ New Orleans │
└─────────────┘