databend icon indicating copy to clipboard operation
databend copied to clipboard

Migrate `Expression` and `ExpressionExecutor` in ClusterStatsGenerator

Open Xuanwo opened this issue 3 years ago • 3 comments

Discussed in https://github.com/datafuselabs/databend/discussions/7570

Originally posted by Xuanwo September 11, 2022 Hi, I'm working on removing old planners. More specifically, I'm try to removing sqlparser.

However, there are some code in storage still depends on sqlparser's Expression:

#[derive(Clone, Default)]
pub struct ClusterStatsGenerator {
    cluster_key_id: u32,
    cluster_key_index: Vec<usize>,
    expression_executor: Option<ExpressionExecutor>,
    level: i32,
    block_compactor: BlockCompactor,
}

// This can be used in deletion, for an existing block.
pub fn gen_with_origin_stats(
    &self,
    data_block: &DataBlock,
    origin_stats: Option<ClusterStatistics>,
) -> Result<Option<ClusterStatistics>> {
    if origin_stats.is_none() {
        return Ok(None);
    }

    let origin_stats = origin_stats.unwrap();
    if origin_stats.cluster_key_id != self.cluster_key_id {
        return Ok(None);
    }

    let block = if let Some(executor) = &self.expression_executor {
        // For a clustered table, data_block has been sorted, but may not contain cluster key.
        // So only need to get the first and the last row for execute.
        let indices = vec![0u32, data_block.num_rows() as u32 - 1];
        let input = DataBlock::block_take_by_indices(data_block, &indices)?;
        executor.execute(&input)?
    } else {
        data_block.clone()
    };

    self.clusters_statistics(&block, origin_stats.level)
}

The cluster_key here is table's column name parsed by:

async fn analyze_cluster_keys(
    &mut self,
    cluster_by: &[Expr<'a>],
    schema: DataSchemaRef,
) -> Result<Vec<String>> { .. }

ClusterStatsGenerator will use expression_executor to remove unused columns.


How can I remove the Expression here?

Xuanwo avatar Sep 13 '22 04:09 Xuanwo

Cc @leiysky

Xuanwo avatar Sep 13 '22 04:09 Xuanwo

To address #7593, I plan to:

  • [x] https://github.com/datafuselabs/databend/pull/7597
  • [ ] https://github.com/datafuselabs/databend/pull/7603
  • [ ] Move sql/planners to planners so other crates can depend on it.
  • [x] https://github.com/datafuselabs/databend/pull/7600
  • [ ] Delay storage cluster key's parse until we have a TableContext.

Xuanwo avatar Sep 14 '22 00:09 Xuanwo

Current progress: https://github.com/datafuselabs/databend/pull/7806

Xuanwo avatar Sep 22 '22 08:09 Xuanwo

Already implemented

Xuanwo avatar Nov 14 '22 12:11 Xuanwo