doris
doris copied to clipboard
[Fix](multi-catalog) Fix string dictionary filtering when using null related functions in parquet and orc reader by disabling dictionary filtering when predicates contain functions.
Proposed changes
Issue
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
Root cause:
The current implementation of dictionary filtering does not take into account the implementation of NULL values because the dictionary itself does not contain NULL value encoding. As a result, many NULL-related functions or expressions cannot work properly, such as is null
, is not null
, coalesce
, etc.
Solution
Here we first disable dictionary filtering when predicate contains functions. Implementation of NULL value dictionary filtering will be carried out later.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...