paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[core] Enable file index for map type with map-keys.

Open leaves12138 opened this issue 1 year ago • 0 comments

  • API If you want to create a file index with map-keys, you need to specify.
CREATE TABLE <PAIMON_TABLE> (<COLUMN> <COLUMN_TYPE> , ...) WITH
(
"file-index.bloom-filter.columns" = "map_data",
"file-index.bloom-filter.map_data.items" = "200", --optional
"file-index.bloom-filter.map_data.fpp" = "0.1",  --optional
"file-index.bloom-filter.map_data.map-keys" = "key1,key2,key3,key4",  --needed      
"file-index.bloom-filter.map_data##key1.items" = "10000", --optional
"file-index.in-manifest-threshold" = "500 B" --optional
)

If you want to specify the options for specific map key, you can add an external options like: file-index.bloom-filter.<map_column>##<map_key>.<op_key>='<op_value>' clickhouse: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes

  • NOTE
  1. flink sql read select * from <table> where map_data['key1'] = 'value1' will not push down
  2. Every key in map-keys will generate a standalone bloom-filter, so if just need to predicate map_data['key1'] will not all map_data bloom-filter, just only map_data['key1'] bloom-filter.
  3. map-keys refer to clickhouse "INDEX map_key_index mapKeys(map_column) TYPE bloom_filter GRANULARITY 1"

leaves12138 avatar Apr 16 '24 10:04 leaves12138