datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

feat: add projection to FilterExec

Open junjunjd opened this issue 8 months ago • 9 comments

Which issue does this PR close?

Closes #5436.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

junjunjd avatar Oct 26 '23 08:10 junjunjd

Nice @junjunjd . I think the remaining work is to add it as well to projection push down :)

Dandandan avatar Oct 28 '23 10:10 Dandandan

@junjunjd FYI, I merged and pushed some changes towards pushing projection pushdown.

Dandandan avatar Nov 04 '23 15:11 Dandandan

@junjunjd FYI, I've committed a working version. The remaining work is fixing test (expectations) and/or remaining issues.

Dandandan avatar Nov 05 '23 13:11 Dandandan

Thanks @Dandandan! I will take a look at the tests.

junjunjd avatar Nov 06 '23 04:11 junjunjd

Current version (not sure where the regressions come from, but results are promising): @junjunjd

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ projection_filter_exec ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  91.11ms │                90.66ms │     no change │
│ QQuery 2     │  26.68ms │                24.62ms │ +1.08x faster │
│ QQuery 3     │  52.38ms │                42.46ms │ +1.23x faster │
│ QQuery 4     │  51.02ms │                27.82ms │ +1.83x faster │
│ QQuery 5     │ 115.14ms │                70.60ms │ +1.63x faster │
│ QQuery 6     │   9.75ms │                 8.35ms │ +1.17x faster │
│ QQuery 7     │ 212.57ms │               209.77ms │     no change │
│ QQuery 8     │  60.12ms │                75.56ms │  1.26x slower │
│ QQuery 9     │  59.65ms │                81.61ms │  1.37x slower │
│ QQuery 10    │ 113.51ms │                75.39ms │ +1.51x faster │
│ QQuery 11    │  19.42ms │                19.37ms │     no change │
│ QQuery 12    │  58.83ms │                43.10ms │ +1.37x faster │
│ QQuery 13    │  54.67ms │                30.95ms │ +1.77x faster │
│ QQuery 14    │  18.18ms │                12.36ms │ +1.47x faster │
│ QQuery 15    │  58.83ms │                39.23ms │ +1.50x faster │
│ QQuery 16    │  21.80ms │                22.48ms │     no change │
│ QQuery 17    │  53.05ms │                65.16ms │  1.23x slower │
│ QQuery 18    │ 154.05ms │               142.87ms │ +1.08x faster │
│ QQuery 19    │  34.50ms │                29.88ms │ +1.15x faster │
│ QQuery 20    │  62.34ms │                50.79ms │ +1.23x faster │
│ QQuery 21    │ 247.24ms │               170.29ms │ +1.45x faster │
│ QQuery 22    │  14.14ms │                13.91ms │     no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘

Dandandan avatar Dec 01 '23 16:12 Dandandan

@junjunjd if you are able to work on this, it would be good to fix the remaining tests (either test need to be changed or expected output needs to be changed) and see why we have the regression on a few queries.

Dandandan avatar Dec 01 '23 19:12 Dandandan

Ok, on a flight I got some more time to find the regression. It was related to join selection and statistics.

Current version shows no regressions anymore 🎉

--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ projection_filter_exec ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 151.50ms │               151.57ms │     no change │
│ QQuery 2     │  49.60ms │                49.71ms │     no change │
│ QQuery 3     │  70.18ms │                68.88ms │     no change │
│ QQuery 4     │  51.75ms │                50.40ms │     no change │
│ QQuery 5     │  82.02ms │                81.63ms │     no change │
│ QQuery 6     │  31.84ms │                31.47ms │     no change │
│ QQuery 7     │ 102.68ms │               102.40ms │     no change │
│ QQuery 8     │ 107.65ms │               107.11ms │     no change │
│ QQuery 9     │ 131.21ms │               130.31ms │     no change │
│ QQuery 10    │ 137.28ms │               134.48ms │     no change │
│ QQuery 11    │  37.14ms │                36.55ms │     no change │
│ QQuery 12    │  85.27ms │                84.92ms │     no change │
│ QQuery 13    │ 190.22ms │               180.37ms │ +1.05x faster │
│ QQuery 14    │  50.64ms │                51.16ms │     no change │
│ QQuery 15    │  61.33ms │                55.79ms │ +1.10x faster │
│ QQuery 16    │  54.56ms │                52.42ms │     no change │
│ QQuery 17    │ 100.46ms │               102.43ms │     no change │
│ QQuery 18    │ 188.07ms │               188.75ms │     no change │
│ QQuery 19    │  98.04ms │                98.17ms │     no change │
│ QQuery 20    │  55.19ms │                53.82ms │     no change │
│ QQuery 21    │ 128.41ms │               120.85ms │ +1.06x faster │
│ QQuery 22    │  38.39ms │                37.76ms │     no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ projection_filter_exec ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  91.51ms │                89.51ms │     no change │
│ QQuery 2     │  25.79ms │                22.87ms │ +1.13x faster │
│ QQuery 3     │  53.71ms │                41.13ms │ +1.31x faster │
│ QQuery 4     │  51.74ms │                28.65ms │ +1.81x faster │
│ QQuery 5     │ 115.24ms │                84.50ms │ +1.36x faster │
│ QQuery 6     │   9.28ms │                 8.36ms │ +1.11x faster │
│ QQuery 7     │ 215.76ms │               212.17ms │     no change │
│ QQuery 8     │  59.93ms │                60.06ms │     no change │
│ QQuery 9     │  58.50ms │                57.97ms │     no change │
│ QQuery 10    │ 115.36ms │                79.37ms │ +1.45x faster │
│ QQuery 11    │  19.34ms │                19.77ms │     no change │
│ QQuery 12    │  59.47ms │                36.31ms │ +1.64x faster │
│ QQuery 13    │  54.12ms │                43.69ms │ +1.24x faster │
│ QQuery 14    │  18.04ms │                12.17ms │ +1.48x faster │
│ QQuery 15    │  58.53ms │                38.91ms │ +1.50x faster │
│ QQuery 16    │  21.86ms │                21.94ms │     no change │
│ QQuery 17    │  53.52ms │                53.08ms │     no change │
│ QQuery 18    │ 156.56ms │               145.73ms │ +1.07x faster │
│ QQuery 19    │  35.22ms │                28.24ms │ +1.25x faster │
│ QQuery 20    │  63.21ms │                50.03ms │ +1.26x faster │
│ QQuery 21    │ 249.52ms │               173.98ms │ +1.43x faster │
│ QQuery 22    │  14.70ms │                14.32ms │     no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘

Dandandan avatar Dec 03 '23 11:12 Dandandan

As adding it to the logicalplan seems to cause a lot of trouble, I plan on moving this to the physical plan optimization phase only.

Dandandan avatar Dec 04 '23 09:12 Dandandan

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Apr 25 '24 01:04 github-actions[bot]