Invalid inserts into the MergeTree table when omitting values for non-null columns
Describe the bug
I have encountered a super weird behavior when executing inserts using clickhouse-jdbc driver. We have a ReplicatedSummingMergeTree table which has the following schema:
CREATE TABLE IF NOT EXISTS db1.table1 ON CLUSTER cluster1 (
`date` DateTime,
`key1` UInt32,
`created` UInt32,
`updated` UInt32,
`removed` UInt32
)
ENGINE = ReplicatedSummingMergeTree('/clickhouse/tables/{cluster}-{shard}/db1.table1', '{replica}')
PARTITION BY toYYYYMM(date)
ORDER BY (toStartOfHour(date), key1)
PRIMARY KEY (toStartOfHour(date), key1);
CREATE TABLE IF NOT EXISTS db1.Table1 ON CLUSTER cluster1 AS db1.table1
ENGINE = Distributed(cluster1, db1, table1, rand());
Clickhouse has non-null columns by default, and in the case when no value is provided when inserting a record, it would be populated with the default value of the type that the column is defined. However, we wanted to change this behavior in the way that CH would throw an exception when no value is provided, so we set the following config:
insert_null_as_default=0
However, let me describe what actually happened. We inserted 5 records as a batch using clickhouse-jdbc driver where all records were missing a value for one of the columns which was created. What we observed was that the first row wasn't inserted at all and no errors were found on both client and server sides. Then, the other 4 records were actually inserted and their values for the created column were all 0(which I assume is the default value for UInt32. Now, a few questions:
- Is it really a bug or is it expected and we are under the wrong assumption regarding config/CH behavior for handling non-null columns?
- Why
insert_null_as_defaultis not respected? - Why only the first row is not inserted? Our assumption was that none of the rows would be inserted followed by a proper exception handling strategy, e.g., client-side exception.
Configuration
Environment
- Client version: 0.4.0
- Language version: Java 17
- OS: N/A
ClickHouse server
- ClickHouse Server version: 21.8.14
- ClickHouse Server non-default settings, if any: insert_null_as_default=0
Hi @malikas05, I'm no expert of ClickHouse but let me try to answer your questions here.
- Is it really a bug or is it expected and we are under the wrong assumption regarding config/CH behavior for handling non-null columns?
IMO, the default behavior is not intuitive but it might be very useful for most users. As far as I know, there are 3 settings related to null handling in insertion(along with a few more for specific data formats):
- insert_null_as_default for INSERT SELECT
- input_format_null_as_default
- input_format_defaults_for_omitted_fields for omitted fields but only support a few data formats
So I'd say it's by design instead of a bug.
- Why insert_null_as_default is not respected?
insert_null_as_default is for INSERT SELECT statement. I think you should use input_format_null_as_default and/or input_format_defaults_for_omitted_fields instead. Since the later does not support RowBinary, which is used by JDBC driver, I'd suggest to avoid omitting fields in insertion and use input_format_null_as_default to control the behavior. Alternatively, if you don't want to specify all the fields, you may use the other option and JSONEachRow format(example).
- Why only the first row is not inserted? Our assumption was that none of the rows would be inserted followed by a proper exception handling strategy, e.g., client-side exception.
I'm not sure about this one. Do you have reproducible queries or code? Could that relate to same primary key used by the first row?
Lastly, please consider to upgrade the driver to v0.4.6 or you may run into serialization issue for Nested data types.
Hi @zhicwu. Thanks so much for your answer. As you have advised in your second point, I have tried to disable all those settings:
SELECT value
FROM system.settings
WHERE name IN ('insert_null_as_default', 'input_format_null_as_default', 'input_format_defaults_for_omitted_fields')
Query id: 7b42af18-aef5-4eac-a4d0-a1e714851020
┌─value─┐
│ 0 │
│ 0 │
│ 0 │
└───────┘
3 rows in set. Elapsed: 0.053 sec.
Then I didn't bother to write into the table using jdbc driver but ended up testing using console clickhouse client:
INSERT INTO ssdsp.Table1 (date, audience_audienceId, updated, removed) VALUES ('2023-05-09 00:00:00', 1, 1, 1);
Then, executed SELECT query:
clickhouse-01 :) SELECT * FROM Table1;
SELECT *
FROM Table1
Query id: d0fd5883-2c8a-4b84-aaa0-6e64bf2eca5a
┌────────────────date─┬─audience_audienceId─┬─created─┬─updated─┬─removed─┐
│ 2023-05-09 00:00:00 │ 1 │ 0 │ 1 │ 1 │
└─────────────────────┴─────────────────────┴─────────┴─────────┴─────────┘
As you can see, the new row was added to the table and the omitted column was populated with 0. I wonder why the settings are not respected? I have set them up in the users.xml config file.
As you can see, the new row was added to the table and the omitted column was populated with 0. I wonder why the settings are not respected? I have set them up in the users.xml config file.
So input_format_defaults_for_omitted_fields is not working as expected. I'd still suggest to avoid omitting any field in insert statement, although it's really inconvenient. You may search the issue in ClickHouse repo and check out comments like this.
drop table if exists table2;
create table table2 (
id Int32,
value UInt32
) engine=MergeTree() order by id;
insert into table2(id) settings input_format_defaults_for_omitted_fields=0 format JSONEachRow
{ "id" : "1" }
-- returns 23.3.1.2823 1 0
select version(), * from table2
@zhicwu, I suppose the other two configs don't work as well? Do I need to escalate it further?
I suppose the other two configs don't work as well? Do I need to escalate it further?
Please feel free to open feature request in ClickHouse repo. Actually the other two settings are working as expected, but unfortunately they are not designed to enforce null check for omitted fields.
A few more examples to help you understand:
-- value will be set to zero
insert into table2(id) select 1 settings insert_null_as_default=0
-- Code: 70. DB::Exception: Cannot convert NULL to a non-nullable type: while converting source column `NULL` to destination column value.
-- same for insert into table2(id, value) select 1, null settings insert_null_as_default=0
insert into table2 select 1, null settings insert_null_as_default=0
-- value will be set to zero
insert into table2(id) settings input_format_null_as_default=0 values(1)
-- Code: 53. DB::Exception: Cannot insert NULL value into a column of type 'UInt32' at: null): While executing ValuesBlockInputFormat.
-- same for insert into table2(id, value) settings input_format_null_as_default=0 values(1, null)
insert into table2 settings input_format_null_as_default=0 values(1, null)
Hi @den-crane, are you familiar with any methods to enforce a null check for excluded fields in an INSERT query? How can we configure ClickHouse to raise an exception when executing an INSERT statement, such as INSERT INTO table2(id) VALUES(1), when a non-nullable column value is missing?
@malikas05 I've added PR to ClickHouse to fix that annoying problem :)