datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Draft: Use take-in kernel in repartitioning

Open ctsk opened this issue 8 months ago • 5 comments

Combined with https://github.com/apache/arrow-rs/pull/7325, tries to use the take_in kernel in repartitioning. The goal is to elide the coalesce step after repartitioning.

ctsk avatar Mar 24 '25 13:03 ctsk

Hi @ctsk -- is this PR ready for running some benchmarks?

alamb avatar Mar 25 '25 21:03 alamb

@alamb This PR should be able to run benchmarks now. I've added overrides to use the modified version of arrow in the PR and a lockfile to avoid chrono issues. At least it can run tpch :)

ctsk avatar Mar 26 '25 15:03 ctsk

I am firing up the benchmarks

alamb avatar Mar 27 '25 19:03 alamb

I tried to run the clickbench queries using bench.sh and I got an error like this:

Q1: SELECT COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserCountry"), COUNT(DISTINCT "BrowserLanguage")  FROM hits;
Query 1 iteration 0 took 760.6 ms and returned 1 rows
Query 1 iteration 1 took 785.4 ms and returned 1 rows
Query 1 iteration 2 took 787.9 ms and returned 1 rows
Query 1 iteration 3 took 775.5 ms and returned 1 rows
Query 1 iteration 4 took 786.6 ms and returned 1 rows
Q2: SELECT "BrowserCountry",  COUNT(DISTINCT "SocialNetwork"), COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserLanguage"), COUNT(DISTINCT "Soci\
alAction") FROM hits GROUP BY 1 ORDER BY 2 DESC LIMIT 10;

thread 'tokio-runtime-worker' panicked at /home/alamb/.cargo/git/checkouts/arrow-rs-583cca34693b79b8/368c1e6/arrow-array/src/builder/mod.rs:509:35\
:
not yet implemented
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: Context("Join Error", External(JoinError::Panic(Id(2790), "not yet implemented", ...)))

alamb avatar Mar 27 '25 21:03 alamb

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jun 12 '25 02:06 github-actions[bot]