databend icon indicating copy to clipboard operation
databend copied to clipboard

feat: make `table.read_partitions` distributed

Open BohuTANG opened this issue 3 years ago • 2 comments

Summary

table.read_partitions may do many IO operations, such as the min-max index filter or bloom filter index filter. If a table has many partitions, the read_partitions will be very slow.

For distributed, we can:

  1. read_partitions return segments instead of partition if the segments > 1000
  2. Distribute the Partitions to cluster
  3. In read2, to check file is segment or partition file

BohuTANG avatar Sep 22 '22 08:09 BohuTANG

cc @dantengsky

BohuTANG avatar Sep 22 '22 08:09 BohuTANG

I expect to decouple ReadDataSourcePlan from the Table API in https://github.com/datafuselabs/databend/issues/7816.

Please let me know if anything I can help with. @zhang2014

Xuanwo avatar Sep 23 '22 09:09 Xuanwo

Impl in https://github.com/datafuselabs/databend/pull/7867 cc @zhang2014

BohuTANG avatar Oct 08 '22 05:10 BohuTANG