databend
databend copied to clipboard
Tracking issue for new expression framework
This is a tracking issue for the design and implementation of the new scalar expression framework.
Summary
The new expression framework contains these improvements:
- Compile-time (for SQL, aka, when planning) type checking, which could capture all type errors in the beginning, and provide nice-looking error reports.
- Untyped expression evaluation, which means function evaluation will no longer worry about data typing.
- Auto vectorization, generics, auto downcasting, and more ergonomic improvements for writing SQL function.
- Distributed evaluation in mind on the first day.
Reference
RFC: Formal Type System Demo: Typed Type Exercise in Rust
Migration plan
The new expression framework will replace the legacy common-datavalues
, common-datablocks
and refactor all functions in common-function
. So it's a tough task, and we are going to break it into small steps:
- Add a new
common-expression
crate with expression definition (AST), type checking, and evaluation runtime. - Run benchmark on the new framework and make necessary improvements.
- Migrate all functions to the new framework.
Tasks
features
- [x] https://github.com/datafuselabs/databend/pull/6576
- [x] https://github.com/datafuselabs/databend/pull/6649
- [x] https://github.com/datafuselabs/databend/pull/7054
- [x] https://github.com/datafuselabs/databend/pull/6674
- [x] https://github.com/datafuselabs/databend/pull/6663
- [x] https://github.com/datafuselabs/databend/pull/6661
- [x] https://github.com/datafuselabs/databend/pull/6662
- [x] https://github.com/datafuselabs/databend/pull/6597
- [x] https://github.com/datafuselabs/databend/pull/6712
- [x] https://github.com/datafuselabs/databend/pull/6918
- [x] https://github.com/datafuselabs/databend/pull/7075
- [x] https://github.com/datafuselabs/databend/issues/7020
- [x] #6636
- [ ] #6635
- [x] #6634
- [ ] https://github.com/datafuselabs/databend/pull/7781
- [ ] Conversion between
Boolean
and other types - [ ] Conversion between
String
and other types - [ ] Literal deserialization
refactor
- [x] https://github.com/datafuselabs/databend/pull/6677
- [x] https://github.com/datafuselabs/databend/pull/6787
- [x] https://github.com/datafuselabs/databend/pull/6756
- [x] https://github.com/datafuselabs/databend/pull/6856
- [x] https://github.com/datafuselabs/databend/pull/6867
- [x] https://github.com/datafuselabs/databend/pull/6923
- [ ] Enrich doc comment
migration
We can generate new test files via using env REGENERATE_GOLDENFILES=1 cargo test
and git diff
to show differs
- [x] https://github.com/datafuselabs/databend/issues/6763
- [ ] https://github.com/datafuselabs/databend/issues/6766
- [x] https://github.com/datafuselabs/databend/issues/7255
- [x] https://github.com/datafuselabs/databend/issues/6833
- [ ] https://github.com/datafuselabs/databend/issues/7636
- [ ] Support
rand()
- [x] #7091
Make Databend Type System Great Again!
As the features of the new expression framework are getting much more complete, it'll soon be able to start migrating functions from the old framework. Before stepping that far, I made a rough benchmark between the old framework and the new one.
Please note that the benchmark is not been set up very carefully and the number may be only valuable at the magnitude level. The purpose of the benchmark is to see whether any significant performance regression exists in the new framework.
The benchmark is executing col1 + col2 + ...... + col14
where all the columns are Int64
with arbitrary values. The measurement doesn't count the time for type checking, datablock preparation, etc, but just the evaluation for the expression.
What's the stateless test migration plan for the new expression framework?
What's the stateless test migration plan for the new expression framework?
stateless test migration is independent of the new expression framework because it is going to silently replace the old expression. The trick is to add a new variant NewExpr
in the old Expression
enum and handle it in the old evaluator.