databend icon indicating copy to clipboard operation
databend copied to clipboard

read_bloom_filter_index() take 2min for ontime

Open youngsofun opened this issue 2 years ago โ€ข 8 comments

can we accelerate it๏ผŸ

cc @dantengsky

youngsofun avatar Aug 30 '22 03:08 youngsofun

yes

  • enable table cache bloom filter index will be cached (by col) if table cache is enabled

    but apparently, it only works after warming up

  • allow more concurrency in pruning

https://github.com/datafuselabs/databend/blob/869344f106c69552ff61835c69da471f4de40f95/src/query/storages/fuse/src/pruning/pruning_executor.rs#L103-L106

btw, how many parts and rows for the test set?

dantengsky avatar Aug 30 '22 05:08 dantengsky

btw, how many parts and rows for the test set?

202687555 rows.

how to show num of parts?

youngsofun avatar Aug 31 '22 01:08 youngsofun

SELECT * FROM fuse_snapshot('<database_name>', '<table_name>'); or explain with latest main branch.

BohuTANG avatar Aug 31 '22 01:08 BohuTANG

image

youngsofun avatar Aug 31 '22 02:08 youngsofun

got it , thanks!

dantengsky avatar Aug 31 '22 02:08 dantengsky

It would be better to create a bloom-filter index for STRING types only? cc @dantengsky

BohuTANG avatar Sep 14 '22 01:09 BohuTANG

@drmingdrmer is working on this to make it faster ๐Ÿค—

BohuTANG avatar Sep 20 '22 03:09 BohuTANG

Oh, man. Thanks for reminding me.

I'm trying to dig deeper to grasp what's going on inside the fuse table with the bloom filter.

I will give a refactoring proposal ASAP.

drmingdrmer avatar Sep 20 '22 05:09 drmingdrmer

@youngsofun Can you try with the main branch? I think we have some improvements with this patch: https://github.com/datafuselabs/databend/pull/7870

BohuTANG avatar Sep 27 '22 03:09 BohuTANG

the problem is originally for Q10 on ontime. on a test cluster. and I found simplySELECT count(*) FROM default.ontime WHEREdepdel15=1; is abnormal: slow with many log about bloom filter


can not open test.datafusecloud.com now

youngsofun avatar Sep 27 '22 05:09 youngsofun

I think the time cost is small now, let's close, if you find the time is still long please re-open next time.

BohuTANG avatar Sep 27 '22 06:09 BohuTANG