feat: make `table.read_partitions` distributed

Open BohuTANG opened this issue 3 years ago • 2 comments

Summary

table.read_partitions may do many IO operations, such as the min-max index filter or bloom filter index filter. If a table has many partitions, the read_partitions will be very slow.

For distributed, we can:

read_partitions return segments instead of partition if the segments > 1000
Distribute the Partitions to cluster
In read2, to check file is segment or partition file

Sep 22 '22 08:09 BohuTANG

cc @dantengsky

Sep 22 '22 08:09 BohuTANG

I expect to decouple ReadDataSourcePlan from the Table API in https://github.com/datafuselabs/databend/issues/7816.

Please let me know if anything I can help with. @zhang2014

Sep 23 '22 09:09 Xuanwo

Impl in https://github.com/datafuselabs/databend/pull/7867 cc @zhang2014

Oct 08 '22 05:10 BohuTANG