doris icon indicating copy to clipboard operation
doris copied to clipboard

[Fix](multi-catalog) Fix string dictionary filtering when using null related functions in parquet and orc reader by disabling dictionary filtering when predicates contain functions.

Open kaka11chen opened this issue 9 months ago • 7 comments

Proposed changes

Issue

The following sql and when the dictionary column contains functions related to null, the results will be incorrect.

select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';

Root cause:

The current implementation of dictionary filtering does not take into account the implementation of NULL values because the dictionary itself does not contain NULL value encoding. As a result, many NULL-related functions or expressions cannot work properly, such as is null, is not null, coalesce, etc.

Solution

Here we first disable dictionary filtering when predicate contains functions. Implementation of NULL value dictionary filtering will be carried out later.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

kaka11chen avatar May 24 '24 04:05 kaka11chen