duckdb-web
duckdb-web copied to clipboard
Issue found on page 'CSV Import', section "Parameters": `force_not_null` could benefit from clearer wording and crosslink to `nullstr`-parameter
I found the documentation for the force_not_null
parameter to read_csv
hard to understand.
The parameter name suggests a BOOLEAN value, but it's actually a list of VARCHARs.
The description reads:
Do not match the specified columns' values against the NULL string. In the default case where the NULL string is empty, this means that empty values will be read as zero-length strings rather than NULLs.
From description it is not immediately clear to what kind of thing is specified through this parameter. The construct "the specified columns' values" is ambiguous - it could either mean "columns" or "values". Because the rest of the description also focuses on values and value matching, I thought it was about values.
Now I believe that the force_not_null
parameter allows one to specify a list of column names, and that its effect is that for those specified columns, the reader will not attempt to match the column's values to those specified by the nullstr
parameter (which would extract those literal values as SQL NULL values)
I suggest a rewording of the description that explicitly says the force_not_null
parameter takes a list of column names as value, and to also literally name the nullstr
parameter to which it corresponds. Something like this:
By default, the CSV reader compares each column's raw values against the
NULL
-strings to substitute them with SQLNULL
-values. Theforce_not_null
parameter can be used to specify a list of column names for which the reader should not apply anyNULL
-string substitution. The set ofNULL
-strings can be specified by thenullstr
-parameter. By default, thenullstr
-parameter is set to the empty string, thus substituting empty raw values with SQLNULL
-values. Columns that need to retain the literal empty-string values should be passed in theforce_not_null
parameter.
And for completeness, the nullstr
-parameter should also get some text to establish the relationship with the force_not_null
parameter. I think this could be done by adding this to its description:
The substitution of
nullstr
-parameter values with SQLNULL
-values can be suprressed by passing the names of those columns in theforce_not_null
-parameter. If thenullstr
-parameter includes the empty string (default), then the columns passed to theforce_not_null
-parameter will retain any empty string values instead of substituting them with SQLNULL
-values.
Page URL: https://duckdb.org/docs/data/csv/overview#parameters