databend icon indicating copy to clipboard operation
databend copied to clipboard

feat: support asof join

Open zenus opened this issue 1 year ago โ€ข 16 comments

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  1. thanks to @xudong963 for the nice work of range join
  2. thanks to duckdb for the nice idea of refactor window functions to implement asof join

Currently, due to different ways of implementing range join, the order of results obtained by asof join is random. I may be able to get help from @xudong963ใ€‚that's why i do not add any test case.

image

Benchmark

image

build side : 5w probe side : 5w

image
  • Fixes #[https://github.com/datafuselabs/databend/issues/15410]

Tests

  • [x] Unit Test
  • [x] Logic Test
  • [ ] Benchmark Test
  • [ ] No Test - Explain why

Type of change

  • [ ] Bug Fix (non-breaking change which fixes an issue)
  • [x] New Feature (non-breaking change which adds functionality)
  • [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • [ ] Documentation Update
  • [ ] Refactoring
  • [ ] Performance Improvement
  • [ ] Other (please describe):

This change isโ€‚Reviewable

zenus avatar May 06 '24 14:05 zenus

Thanks @zenus

I think we don't need to ensure the order of results. It makes sense for different databases to have different results order.

I've skimmed through the code and there are a few points to note:

  1. todo!() needs to be replaced with code that makes sense or provides a specific error message.
  2. build_asof_join method can be further split into smaller functions.
  3. Regarding tests, if you are going to borrow the test set from duckdb, you can put the tests in the directory https://github.com/datafuselabs/databend/tree/main/tests/sqllogictests/suites/duckdb/asof_ join
  4. AsOf join can be used as an alternative to window to semantically express temporal relationships, if so, can you provide a performance comparison of asof join vs window?

In addition, I've opened a tracking issue on asof join, if you have time you can continue to finish some sub-issues in it.

nice advice , my pleasure.

zenus avatar May 07 '24 13:05 zenus

I'll review the PR tomorrow.

xudong963 avatar May 14 '24 14:05 xudong963

Seems there is hang in asof tests

ci time out

xudong963 avatar May 21 '24 08:05 xudong963

You can run the failed test file in local by ./target/debug/databend-sqllogictests --run_dir query --run_file xxx.test --debug --handlers mysql

xudong963 avatar May 31 '24 06:05 xudong963

You can run the failed test file in local by ./target/debug/databend-sqllogictests --run_dir query --run_file xxx.test --debug --handlers mysql

ok

zenus avatar May 31 '24 07:05 zenus

image

From the error, I guess some places don't make the schema align.

xudong963 avatar May 31 '24 07:05 xudong963

image

From the error, I guess some places don't make the schema align.

thanks

zenus avatar May 31 '24 07:05 zenus

Seems the pr is ready for review, please resolve the conflict

xudong963 avatar Jun 12 '24 01:06 xudong963

You need to fix lint

make lint in your local and fix them by hints

xudong963 avatar Jun 13 '24 09:06 xudong963

Hi @zenus , do you need some help ?

Dousir9 avatar Jun 29 '24 08:06 Dousir9

Hi @zenus , do you need some help ?

@Dousir9 thanks , i'm enjoying fix it

zenus avatar Jun 29 '24 09:06 zenus

@Dousir9 could you help me , i can not pass all the case, in my laptop, i passed all new test case.

zenus avatar Jul 18 '24 13:07 zenus

@Dousir9 could you help me , i can not pass all the case, in my laptop, i passed all new test case.

@zenus Yeah, I will review this PR today.

Dousir9 avatar Jul 22 '24 03:07 Dousir9

@zenus Sorry for the wait, I was really busy, let's continue this excellent work.

  1. For test_asof_join_ints.test:65, we can fix it by adding an order by column to this sqllogictest.
  2. For test_asof_join_inequal.test:23, it looks like we need to support distributed execution for asof join, cloud you show the query plan for this sqllogictest under distributed mode ? you can use scripts/ci/deploy/databend-query-cluster-3-nodes.sh to start a cluster, and then execute explain ....
  3. Resolving conflicting files with the main branch.

Dousir9 avatar Jul 24 '24 13:07 Dousir9

@zenus Sorry for the wait, I was really busy, let's continue this excellent work.

  1. For test_asof_join_ints.test:65, we can fix it by adding an order by column to this sqllogictest.
  2. For test_asof_join_inequal.test:23, it looks like we need to support distributed execution for asof join, cloud you show the query plan for this sqllogictest under distributed mode ? you can use scripts/ci/deploy/databend-query-cluster-3-nodes.sh to start a cluster, and then execute explain ....
  3. Resolving conflicting files with the main branch.

@Dousir9 i have fixed the first and third , and for the second one i have no idea after i took a week to read the code. image

zenus avatar Aug 13 '24 14:08 zenus

@Dousir9 i have fixed the first and third , and for the second one i have no idea after i took a week to read the code.

@xudong963 could you help me out ?

zenus avatar Aug 26 '24 06:08 zenus