paimon
paimon copied to clipboard
[core] Enable file index for map type with map-keys.
- API If you want to create a file index with map-keys, you need to specify.
CREATE TABLE <PAIMON_TABLE> (<COLUMN> <COLUMN_TYPE> , ...) WITH
(
"file-index.bloom-filter.columns" = "map_data",
"file-index.bloom-filter.map_data.items" = "200", --optional
"file-index.bloom-filter.map_data.fpp" = "0.1", --optional
"file-index.bloom-filter.map_data.map-keys" = "key1,key2,key3,key4", --needed
"file-index.bloom-filter.map_data##key1.items" = "10000", --optional
"file-index.in-manifest-threshold" = "500 B" --optional
)
If you want to specify the options for specific map key, you can add an external options like:
file-index.bloom-filter.<map_column>##<map_key>.<op_key>='<op_value>'
clickhouse: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes
- NOTE
- flink sql read
select * from <table> where map_data['key1'] = 'value1'will not push down - Every key in
map-keyswill generate a standalone bloom-filter, so if just need to predicate map_data['key1'] will not all map_data bloom-filter, just only map_data['key1'] bloom-filter. map-keysrefer to clickhouse "INDEX map_key_index mapKeys(map_column) TYPE bloom_filter GRANULARITY 1"