databend
databend copied to clipboard
feat: make `table.read_partitions` distributed
Summary
table.read_partitions may do many IO operations, such as the min-max index filter or bloom filter index filter.
If a table has many partitions, the read_partitions will be very slow.
For distributed, we can:
read_partitionsreturn segments instead of partition if the segments > 1000- Distribute the
Partitionsto cluster - In
read2, to check file issegmentorpartitionfile
cc @dantengsky
I expect to decouple ReadDataSourcePlan from the Table API in https://github.com/datafuselabs/databend/issues/7816.
Please let me know if anything I can help with. @zhang2014
Impl in https://github.com/datafuselabs/databend/pull/7867 cc @zhang2014