zed icon indicating copy to clipboard operation
zed copied to clipboard

SQL: NULL values absent from JOIN output

Open philrz opened this issue 8 months ago • 3 comments

When Postgres runs this query, it has NULL values for k and l in the bottom output record.

$ super -f parquet -o integers.parquet -c "
values
  {i:1::int32,j:2::int32},
  {i:2::int32,j:3::int32},
  {i:3::int32,j:4::int32}
" &&
super -f parquet -o integers2.parquet -c "
values
  {k:1::int32,l:10::int32},
  {k:2::int32,l:20::int32}
" &&
super -c super -c "SELECT * FROM integers.parquet LEFT OUTER JOIN integers2.parquet ON integers.i=integers2.k ORDER BY i;"

{i:1::int32,j:2::int32,k:1::int32,l:10::int32}
{i:2::int32,j:3::int32,k:2::int32,l:20::int32}
{i:3::int32,j:4::int32}

Details

Repro is with super commit a90585f. This was found via a query from a sqllogictest.

Here's the query running in Postgres with a client setting to make it easier to see NULL values.

$ psql --version
psql (PostgreSQL) 17.6 (Homebrew)

$ psql postgres -P null=NULL -c "
DROP TABLE IF EXISTS integers;
DROP TABLE IF EXISTS integers2;
CREATE TABLE integers(i INTEGER, j INTEGER);
INSERT INTO integers VALUES (1, 2), (2, 3), (3, 4);
CREATE TABLE integers2(k INTEGER, l INTEGER);
INSERT INTO integers2 VALUES (1, 10), (2, 20);
SELECT * FROM integers LEFT OUTER JOIN integers2 ON integers.i=integers2.k ORDER BY i;"

DROP TABLE
DROP TABLE
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 2

 i | j |  k   |  l   
---+---+------+------
 1 | 2 |    1 |   10
 2 | 3 |    2 |   20
 3 | 4 | NULL | NULL
(3 rows)

Whereas in SuperDB, the bottom output record lacked values for the fields k and l.

Now, as a user, I know that if I want the NULL values, I can just tack on a piped fuse.

$ super -version
Version: a90585f6d

$ super -c "
SELECT * FROM integers.parquet LEFT OUTER JOIN integers2.parquet ON integers.i=integers2.k ORDER BY i
| fuse"

{i:1::int32,j:2::int32,k:1::int32,l:10::int32}
{i:2::int32,j:3::int32,k:2::int32,l:20::int32}
{i:3::int32,j:4::int32,k:null::int32,l:null::int32}

I'm not sure if we should be doing something similar automatically for a pure SQL query to match the Postgres behavior.

philrz avatar Jun 26 '25 22:06 philrz

Here's a variation in another sqllogictest. This uses the same input data as in the first repro above.

Due to the join condition 1=2 always evaluating to FALSE, in Postgres we end up with columns k and l that are entirely made up of NULL values.

$ psql postgres -P null=NULL -c "
SELECT * FROM integers LEFT OUTER JOIN integers2 ON 1=2 ORDER BY i;"

 i | j |  k   |  l   
---+---+------+------
 1 | 2 | NULL | NULL
 2 | 3 | NULL | NULL
 3 | 4 | NULL | NULL
(3 rows)

And due to the same effect we saw previously, here SuperDB leaves out those k and l columns entirely from the output.

$ super -c "
SELECT * FROM integers.parquet LEFT OUTER JOIN integers2.parquet ON 1=2 ORDER BY i;"

{i:1,j:2}
{i:2,j:3}
{i:3,j:4}

This is noteworthy because, as a user, I can't use the fuse trick I did previously to match the SQL output.

philrz avatar Sep 03 '25 19:09 philrz

Update: The changes in #6264 have brought the query results here closer to matching SQL, but not all the way. Whereas before the query results from super had empty fields or missing columns compared to SQL filling those in with NULL, now super includes those fields/columns but fills them in with error("missing") values.

Here's the two repro queries shown above being re-run at super commit 79bb8e3 that's right after the merge of #6264.

$ super -version
Version: 79bb8e346

$ super -f parquet -o integers.parquet -c "
values
  {i:1::int32,j:2::int32},
  {i:2::int32,j:3::int32},
  {i:3::int32,j:4::int32}
" && super -f parquet -o integers2.parquet -c "
values
  {k:1::int32,l:10::int32},
  {k:2::int32,l:20::int32}
" && super -c super -c "SELECT * FROM integers.parquet LEFT OUTER JOIN integers2.parquet ON integers.i=integers2.k ORDER BY i;"

{i:1::int32,j:2::int32,k:1::int32,l:10::int32}
{i:2::int32,j:3::int32,k:2::int32,l:20::int32}
{i:3::int32,j:4::int32,k:error("missing"),l:error("missing")}
$ super -c "
SELECT * FROM integers.parquet LEFT OUTER JOIN integers2.parquet ON 1=2 ORDER BY i;"

{i:1::int32,j:2::int32,k:error("missing"),l:error("missing")}
{i:2::int32,j:3::int32,k:error("missing"),l:error("missing")}
{i:3::int32,j:4::int32,k:error("missing"),l:error("missing")}

philrz avatar Sep 24 '25 00:09 philrz

In a recent community Slack thread a user reported bumping into this issue.

philrz avatar Oct 20 '25 22:10 philrz