trino icon indicating copy to clipboard operation
trino copied to clipboard

Add test for `MERGE` statement on `NOT NULL` column to `BaseConnectorTest`

Open ebyhr opened this issue 2 years ago • 6 comments

ebyhr avatar Aug 06 '22 03:08 ebyhr

cc @djsstarburst @homar

findepi avatar Aug 06 '22 09:08 findepi

This test shows that MERGE erroneously allows non-null columns to be set to null:

    @Test
    public void testMergeNonNullableColumns()
    {
        String targetTable = "merge_non_nullable_target_" + randomTableSuffix();

        assertUpdate(createTableForWrites(format("CREATE TABLE %s (nation_name VARCHAR NOT NULL, region_name VARCHAR)", targetTable)));

        assertUpdate(format("INSERT INTO %s (nation_name, region_name) VALUES ('FRANCE', 'EUROPE'), ('ALGERIA', 'AFRICA'), ('GERMANY', 'EUROPE')", targetTable), 3);

        assertUpdate(format("MERGE INTO %s t\n", targetTable) +
                " USING (VALUES ('ALGERIA', 'AFRICA')) s(nation_name, region_name)\n" +
                " ON (t.nation_name = s.nation_name)\n" +
                " WHEN MATCHED THEN UPDATE SET nation_name = NULL",
                1);

        assertQuery("SELECT * FROM " + targetTable, "VALUES ('FRANCE', 'EUROPE'), (NULL, 'AFRICA'), ('GERMANY', 'EUROPE')");

        assertUpdate("DROP TABLE " + targetTable);
    }

djsstarburst avatar Aug 07 '22 15:08 djsstarburst

I submitted PR #13535 which changes the two implementations of MergeRowChangeProcessor to throw an exception if a null value is assigned to a non-null column as a result of an SQL MERGE operation. An alternative approach would have been to change the query plan to insert such checks, but the chosen approach is simpler.

The PR also adds a BaseConnectorTest showing that SQL MERGE prevents assignment of NULL to non-null target table columns. Prior to these changes, that test allowed setting non-null columns to null.

djsstarburst avatar Aug 07 '22 20:08 djsstarburst

An alternative approach would have been to change the query plan to insert such checks

We should to this. This would allow the optimizer to eliminate these checks when it's able to prove the values being written are non-NULL.

I'm fine doing this on top of https://github.com/trinodb/trino/pull/13535

findepi avatar Aug 08 '22 08:08 findepi

We should to this. This would allow the optimizer to eliminate these checks when it's able to prove the values being written are non-NULL.

Interesting. My guess was the opposite - - checking for nulls at the query planner level would result in more complex, slower plans. My guess was that for non-null columns produced by MERGE, Block.mayHaveNull() would likely return false, and we would not need to iterate over the values.

djsstarburst avatar Aug 08 '22 20:08 djsstarburst

I'm fine doing this on top of https://github.com/trinodb/trino/pull/13535

It makes sense to me to merge #13535 and if we decide the checks should be moved to the query plan, if would be great if @homar could take on that task.

However, it would be good to demonstrate based on measurements that putting the checks in the query plan actually does improve performance.

djsstarburst avatar Aug 09 '22 13:08 djsstarburst