databend read_bloom_filter_index() take 2min for ontime

read_bloom_filter_index() take 2min for ontime

Open youngsofun opened this issue 2 years ago • 8 comments

can we accelerate it？

cc @dantengsky

Aug 30 '22 03:08 youngsofun

yes

enable table cache bloom filter index will be cached (by col) if table cache is enabled

but apparently, it only works after warming up
allow more concurrency in pruning

https://github.com/datafuselabs/databend/blob/869344f106c69552ff61835c69da471f4de40f95/src/query/storages/fuse/src/pruning/pruning_executor.rs#L103-L106

btw, how many parts and rows for the test set?

Aug 30 '22 05:08 dantengsky

btw, how many parts and rows for the test set?

202687555 rows.

how to show num of parts?

Aug 31 '22 01:08 youngsofun

SELECT * FROM fuse_snapshot('<database_name>', '<table_name>'); or explain with latest main branch.

Aug 31 '22 01:08 BohuTANG

Aug 31 '22 02:08 youngsofun

got it , thanks!

Aug 31 '22 02:08 dantengsky

It would be better to create a bloom-filter index for STRING types only? cc @dantengsky

Sep 14 '22 01:09 BohuTANG

@drmingdrmer is working on this to make it faster 🤗

Sep 20 '22 03:09 BohuTANG

Oh, man. Thanks for reminding me.

I'm trying to dig deeper to grasp what's going on inside the fuse table with the bloom filter.

I will give a refactoring proposal ASAP.

Sep 20 '22 05:09 drmingdrmer

@youngsofun Can you try with the main branch? I think we have some improvements with this patch: https://github.com/datafuselabs/databend/pull/7870

Sep 27 '22 03:09 BohuTANG

the problem is originally for Q10 on ontime. on a test cluster. and I found simplySELECT count(*) FROM default.ontime WHEREdepdel15=1; is abnormal: slow with many log about bloom filter

can not open test.datafusecloud.com now

Sep 27 '22 05:09 youngsofun

I think the time cost is small now, let's close, if you find the time is still long please re-open next time.

Sep 27 '22 06:09 BohuTANG

databend databend copied to clipboard

read_bloom_filter_index() take 2min for ontime

databend
databend copied to clipboard