databend icon indicating copy to clipboard operation
databend copied to clipboard

Tracking issue for new expression framework

Open andylokandy opened this issue 1 year ago • 4 comments

This is a tracking issue for the design and implementation of the new scalar expression framework.

Summary

The new expression framework contains these improvements:

  1. Compile-time (for SQL, aka, when planning) type checking, which could capture all type errors in the beginning, and provide nice-looking error reports.
  2. Untyped expression evaluation, which means function evaluation will no longer worry about data typing.
  3. Auto vectorization, generics, auto downcasting, and more ergonomic improvements for writing SQL function.
  4. Distributed evaluation in mind on the first day.

Reference

RFC: Formal Type System Demo: Typed Type Exercise in Rust

Migration plan

The new expression framework will replace the legacy common-datavalues, common-datablocks and refactor all functions in common-function. So it's a tough task, and we are going to break it into small steps:

  1. Add a new common-expression crate with expression definition (AST), type checking, and evaluation runtime.
  2. Run benchmark on the new framework and make necessary improvements.
  3. Migrate all functions to the new framework.

Tasks

features

  • [x] https://github.com/datafuselabs/databend/pull/6576
  • [x] https://github.com/datafuselabs/databend/pull/6649
  • [x] https://github.com/datafuselabs/databend/pull/7054
  • [x] https://github.com/datafuselabs/databend/pull/6674
  • [x] https://github.com/datafuselabs/databend/pull/6663
  • [x] https://github.com/datafuselabs/databend/pull/6661
  • [x] https://github.com/datafuselabs/databend/pull/6662
  • [x] https://github.com/datafuselabs/databend/pull/6597
  • [x] https://github.com/datafuselabs/databend/pull/6712
  • [x] https://github.com/datafuselabs/databend/pull/6918
  • [x] https://github.com/datafuselabs/databend/pull/7075
  • [x] https://github.com/datafuselabs/databend/issues/7020
  • [x] #6636
  • [ ] #6635
  • [x] #6634
  • [ ] https://github.com/datafuselabs/databend/pull/7781
  • [ ] Conversion between Boolean and other types
  • [ ] Conversion between String and other types
  • [ ] Literal deserialization

refactor

  • [x] https://github.com/datafuselabs/databend/pull/6677
  • [x] https://github.com/datafuselabs/databend/pull/6787
  • [x] https://github.com/datafuselabs/databend/pull/6756
  • [x] https://github.com/datafuselabs/databend/pull/6856
  • [x] https://github.com/datafuselabs/databend/pull/6867
  • [x] https://github.com/datafuselabs/databend/pull/6923
  • [ ] Enrich doc comment

migration

We can generate new test files via using env REGENERATE_GOLDENFILES=1 cargo test and git diff to show differs

  • [x] https://github.com/datafuselabs/databend/issues/6763
  • [ ] https://github.com/datafuselabs/databend/issues/6766
  • [x] https://github.com/datafuselabs/databend/issues/7255
  • [x] https://github.com/datafuselabs/databend/issues/6833
  • [ ] https://github.com/datafuselabs/databend/issues/7636
  • [ ] Supportrand()
  • [x] #7091

andylokandy avatar Jul 08 '22 09:07 andylokandy

Make Databend Type System Great Again!

Xuanwo avatar Jul 09 '22 02:07 Xuanwo

As the features of the new expression framework are getting much more complete, it'll soon be able to start migrating functions from the old framework. Before stepping that far, I made a rough benchmark between the old framework and the new one.

Please note that the benchmark is not been set up very carefully and the number may be only valuable at the magnitude level. The purpose of the benchmark is to see whether any significant performance regression exists in the new framework.

The benchmark is executing col1 + col2 + ...... + col14 where all the columns are Int64 with arbitrary values. The measurement doesn't count the time for type checking, datablock preparation, etc, but just the evaluation for the expression.

bench

andylokandy avatar Jul 20 '22 20:07 andylokandy

What's the stateless test migration plan for the new expression framework?

BohuTANG avatar Jul 23 '22 03:07 BohuTANG

What's the stateless test migration plan for the new expression framework?

stateless test migration is independent of the new expression framework because it is going to silently replace the old expression. The trick is to add a new variant NewExpr in the old Expression enum and handle it in the old evaluator.

andylokandy avatar Jul 27 '22 05:07 andylokandy