feat: support asof join
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
- thanks to @xudong963 for the nice work of range join
- thanks to duckdb for the nice idea of refactor window functions to implement asof join
Currently, due to different ways of implementing range join, the order of results obtained by asof join is random. I may be able to get help from @xudong963ใthat's why i do not add any test case.
Benchmark
build side : 5w probe side : 5w
- Fixes #[https://github.com/datafuselabs/databend/issues/15410]
Tests
- [x] Unit Test
- [x] Logic Test
- [ ] Benchmark Test
- [ ] No Test - Explain why
Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Thanks @zenus
I think we don't need to ensure the order of results. It makes sense for different databases to have different results order.
I've skimmed through the code and there are a few points to note:
todo!()needs to be replaced with code that makes sense or provides a specific error message.build_asof_joinmethod can be further split into smaller functions.- Regarding tests, if you are going to borrow the test set from duckdb, you can put the tests in the directory https://github.com/datafuselabs/databend/tree/main/tests/sqllogictests/suites/duckdb/asof_ join
- AsOf join can be used as an alternative to window to semantically express temporal relationships, if so, can you provide a performance comparison of asof join vs window?
In addition, I've opened a tracking issue on asof join, if you have time you can continue to finish some sub-issues in it.
nice advice , my pleasure.
I'll review the PR tomorrow.
Seems there is hang in asof tests
ci time out
You can run the failed test file in local by ./target/debug/databend-sqllogictests --run_dir query --run_file xxx.test --debug --handlers mysql
You can run the failed test file in local by
./target/debug/databend-sqllogictests --run_dir query --run_file xxx.test --debug --handlers mysql
ok
From the error, I guess some places don't make the schema align.
From the error, I guess some places don't make the schema align.
thanks
Seems the pr is ready for review, please resolve the conflict
You need to fix lint
make lint in your local and fix them by hints
Hi @zenus , do you need some help ?
Hi @zenus , do you need some help ?
@Dousir9 thanks , i'm enjoying fix it
@Dousir9 could you help me , i can not pass all the case, in my laptop, i passed all new test case.
@Dousir9 could you help me , i can not pass all the case, in my laptop, i passed all new test case.
@zenus Yeah, I will review this PR today.
@zenus Sorry for the wait, I was really busy, let's continue this excellent work.
- For
test_asof_join_ints.test:65, we can fix it by adding an order by column to this sqllogictest. - For
test_asof_join_inequal.test:23, it looks like we need to support distributed execution for asof join, cloud you show the query plan for this sqllogictest under distributed mode ? you can usescripts/ci/deploy/databend-query-cluster-3-nodes.shto start a cluster, and then executeexplain .... - Resolving conflicting files with the main branch.
@zenus Sorry for the wait, I was really busy, let's continue this excellent work.
- For
test_asof_join_ints.test:65, we can fix it by adding an order by column to this sqllogictest.- For
test_asof_join_inequal.test:23, it looks like we need to support distributed execution for asof join, cloud you show the query plan for this sqllogictest under distributed mode ? you can usescripts/ci/deploy/databend-query-cluster-3-nodes.shto start a cluster, and then executeexplain ....- Resolving conflicting files with the main branch.
@Dousir9 i have fixed the first and third , and for the second one i have no idea after i took a week to read the code.
@Dousir9 i have fixed the first and third , and for the second one i have no idea after i took a week to read the code.
@xudong963 could you help me out ?
