zed icon indicating copy to clipboard operation
zed copied to clipboard

User config knob for NULL presentation

Open philrz opened this issue 8 months ago • 0 comments

SQL clients often have a user config knob that can be used to specify how NULL values are presented. For formats with a well-defined spec like JSON or SUP it probably makes sense to always stick with the standard, but for looser formats like CSV, TSV, text, and table users would likely appreciate a similar knob with super.

Details

Repro is with super commit 9fcbd22.

We'll run our examples with the following sample data.

$ super -version
Version: 9fcbd222b

$ echo '{a:1} {b:2}' | super -f parquet -o data.parquet -c "fuse" -

$ super -s data.parquet
{a:1,b:null::int64}
{a:null::int64,b:2}

For reasons I think have to do with the roots of super, with text and table formats, right now null values are rendered as -.

$ super -f text data.parquet 
1	-
-	2

$ super -f table data.parquet 
a b
1 -
- 2

As touched on in #5868, with CSV and TSV, they're rendered as empty values.

$ super -f csv data.parquet 
a,b
1,
,2

$ super -f tsv data.parquet 
a	b
1	
	2

By default DuckDB renders them as NULL, but offers the -nullvalue flag to vary this, such as of a user wanted to match what SuperDB currently does.

$ duckdb --version
v1.3.0 71c5c07cdd

$ duckdb --csv -c "SELECT * FROM 'data.parquet';"
a,b
1,NULL
NULL,2

$ duckdb --csv --nullvalue '-' -c "SELECT * FROM 'data.parquet';"
a,b
1,-
-,2

As also touched on in #5868, ClickHouse renders with \N by default, but they have several ways to vary this, including per-format knobs, e.g.,

$ clickhouse --version
ClickHouse local version 25.5.2.47 (official build).

$ clickhouse --format=csv --query "SELECT * FROM 'data.parquet';"
1,\N
\N,2

$ clickhouse --format=csv --format_csv_null_representation='-' --query "SELECT * FROM 'data.parquet';"
1,-
-,2

Postgres psql renders by default as an empty string, but a pset knob can vary this. Using the pg_parquet extension to load data from our Parquet data file:

pg_parquet=# CREATE TABLE data (
  a BIGINT,
  b BIGINT
);
CREATE TABLE

pg_parquet=# COPY data FROM '/Users/phil/work/super-5961/data.parquet';
COPY 2

pg_parquet=# SELECT * FROM data;
 a | b 
---+---
 1 |  
   | 2
(2 rows)

pg_parquet=# \pset null 'NULL'
Null display is "NULL".

pg_parquet=# SELECT * FROM data;
  a   |  b   
------+------
    1 | NULL
 NULL |    2
(2 rows)

Context

I happened to confront this topic because I was working with a sqllogictest that outputs NULL values. Up to that point I was having pretty good luck using super -f text to get the simple outputs that could be diff'ed against the expected outputs in sqllogictest files. super's static use of - creates a challenge for me here personally, but as evidenced by the presence of the knobs in other SQL clients I expect other users may see it as a problem as well.

philrz avatar Jun 17 '25 21:06 philrz