databend
databend copied to clipboard
read_bloom_filter_index() take 2min for ontime
can we accelerate it๏ผ
cc @dantengsky
yes
-
enable table cache bloom filter index will be cached (by col) if table cache is enabled
but apparently, it only works after warming up
-
allow more concurrency in pruning
https://github.com/datafuselabs/databend/blob/869344f106c69552ff61835c69da471f4de40f95/src/query/storages/fuse/src/pruning/pruning_executor.rs#L103-L106
btw, how many parts and rows for the test set?
btw, how many parts and rows for the test set?
202687555 rows.
how to show num of parts?
SELECT * FROM fuse_snapshot('<database_name>', '<table_name>');
or
explain
with latest main branch.
got it , thanks!
It would be better to create a bloom-filter index for STRING
types only? cc @dantengsky
@drmingdrmer is working on this to make it faster ๐ค
Oh, man. Thanks for reminding me.
I'm trying to dig deeper to grasp what's going on inside the fuse table with the bloom filter.
I will give a refactoring proposal ASAP.
@youngsofun Can you try with the main branch? I think we have some improvements with this patch: https://github.com/datafuselabs/databend/pull/7870
the problem is originally for Q10 on ontime. on a test cluster.
and I found simplySELECT count(*) FROM default.ontime WHERE
depdel15=1;
is abnormal: slow with many log about bloom filter
can not open test.datafusecloud.com now
I think the time cost is small now, let's close, if you find the time is still long please re-open next time.