databend icon indicating copy to clipboard operation
databend copied to clipboard

feat(storage): optimize table recluster

Open zhyass opened this issue 3 years ago • 3 comments

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

RFC: https://databend.rs/doc/contributing/rfcs/recluster

Syntax:

optimize table tbl_name [final] recluster

Fixes #6857

zhyass avatar Jul 27 '22 15:07 zhyass

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Aug 18, 2022 at 2:26AM (UTC)

vercel[bot] avatar Jul 27 '22 15:07 vercel[bot]

Added a link to RFC: https://databend.rs/doc/contributing/rfcs/recluster

Xuanwo avatar Jul 30 '22 06:07 Xuanwo

Maybe optimize table tbl_name recluster [final] is better.

sundy-li avatar Aug 04 '22 05:08 sundy-li

Maybe optimize table tbl_name recluster [final] is better.

~~Maybe use optimize table tbl_name recluster directly, final is redundant. Optimization is performed until the table is well clustered enough.~~

zhyass avatar Aug 15 '22 08:08 zhyass

LGTM.

After we support the alter table syntax, we can move the optimize table ... [final] recluster to alter table t1 [final] recluster;:

The powerful recluster will be:

alter table t2 recluster where create_date between ('2016-01-01') and ('2016-01-07');

Snow: https://docs.snowflake.com/en/user-guide/tables-clustering-manual.html

BohuTANG avatar Aug 17 '22 01:08 BohuTANG

After merge with main, the PR is broken:

error[E0308]: mismatched types
   --> src/common/storages/fuse/src/operations/recluster.rs:53:53
    |
53  |         let snapshot_opt = self.read_table_snapshot(ctx.as_ref()).await?;
    |                                 ------------------- ^^^^^^^^^^^^ expected struct `std::sync::Arc`, found reference
    |                                 |
    |                                 arguments to this function are incorrect
    |
    = note: expected struct `std::sync::Arc<(dyn common_catalog::table_context::TableContext + 'static)>`
            found reference `&dyn common_catalog::table_context::TableContext`
note: associated function defined here
   --> src/common/storages/fuse/src/fuse_table.rs:117:25
    |
117 |     pub(crate) async fn read_table_snapshot(
    |                         ^^^^^^^^^^^^^^^^^^^
118 |         &self,
    |         -----
119 |         ctx: Arc<dyn TableContext>,
    |         --------------------------

BohuTANG avatar Aug 18 '22 01:08 BohuTANG

cc @soyeric128 for documentation: optimize table tbl_name recluster [final]

BohuTANG avatar Aug 18 '22 05:08 BohuTANG