datafusion
datafusion copied to clipboard
feat: add projection to FilterExec
Which issue does this PR close?
Closes #5436.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Nice @junjunjd . I think the remaining work is to add it as well to projection push down :)
@junjunjd FYI, I merged and pushed some changes towards pushing projection pushdown.
@junjunjd FYI, I've committed a working version. The remaining work is fixing test (expectations) and/or remaining issues.
Thanks @Dandandan! I will take a look at the tests.
Current version (not sure where the regressions come from, but results are promising): @junjunjd
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ projection_filter_exec ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1 │ 91.11ms │ 90.66ms │ no change │
│ QQuery 2 │ 26.68ms │ 24.62ms │ +1.08x faster │
│ QQuery 3 │ 52.38ms │ 42.46ms │ +1.23x faster │
│ QQuery 4 │ 51.02ms │ 27.82ms │ +1.83x faster │
│ QQuery 5 │ 115.14ms │ 70.60ms │ +1.63x faster │
│ QQuery 6 │ 9.75ms │ 8.35ms │ +1.17x faster │
│ QQuery 7 │ 212.57ms │ 209.77ms │ no change │
│ QQuery 8 │ 60.12ms │ 75.56ms │ 1.26x slower │
│ QQuery 9 │ 59.65ms │ 81.61ms │ 1.37x slower │
│ QQuery 10 │ 113.51ms │ 75.39ms │ +1.51x faster │
│ QQuery 11 │ 19.42ms │ 19.37ms │ no change │
│ QQuery 12 │ 58.83ms │ 43.10ms │ +1.37x faster │
│ QQuery 13 │ 54.67ms │ 30.95ms │ +1.77x faster │
│ QQuery 14 │ 18.18ms │ 12.36ms │ +1.47x faster │
│ QQuery 15 │ 58.83ms │ 39.23ms │ +1.50x faster │
│ QQuery 16 │ 21.80ms │ 22.48ms │ no change │
│ QQuery 17 │ 53.05ms │ 65.16ms │ 1.23x slower │
│ QQuery 18 │ 154.05ms │ 142.87ms │ +1.08x faster │
│ QQuery 19 │ 34.50ms │ 29.88ms │ +1.15x faster │
│ QQuery 20 │ 62.34ms │ 50.79ms │ +1.23x faster │
│ QQuery 21 │ 247.24ms │ 170.29ms │ +1.45x faster │
│ QQuery 22 │ 14.14ms │ 13.91ms │ no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘
@junjunjd if you are able to work on this, it would be good to fix the remaining tests (either test need to be changed or expected output needs to be changed) and see why we have the regression on a few queries.
Ok, on a flight I got some more time to find the regression. It was related to join selection and statistics.
Current version shows no regressions anymore 🎉
--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ projection_filter_exec ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1 │ 151.50ms │ 151.57ms │ no change │
│ QQuery 2 │ 49.60ms │ 49.71ms │ no change │
│ QQuery 3 │ 70.18ms │ 68.88ms │ no change │
│ QQuery 4 │ 51.75ms │ 50.40ms │ no change │
│ QQuery 5 │ 82.02ms │ 81.63ms │ no change │
│ QQuery 6 │ 31.84ms │ 31.47ms │ no change │
│ QQuery 7 │ 102.68ms │ 102.40ms │ no change │
│ QQuery 8 │ 107.65ms │ 107.11ms │ no change │
│ QQuery 9 │ 131.21ms │ 130.31ms │ no change │
│ QQuery 10 │ 137.28ms │ 134.48ms │ no change │
│ QQuery 11 │ 37.14ms │ 36.55ms │ no change │
│ QQuery 12 │ 85.27ms │ 84.92ms │ no change │
│ QQuery 13 │ 190.22ms │ 180.37ms │ +1.05x faster │
│ QQuery 14 │ 50.64ms │ 51.16ms │ no change │
│ QQuery 15 │ 61.33ms │ 55.79ms │ +1.10x faster │
│ QQuery 16 │ 54.56ms │ 52.42ms │ no change │
│ QQuery 17 │ 100.46ms │ 102.43ms │ no change │
│ QQuery 18 │ 188.07ms │ 188.75ms │ no change │
│ QQuery 19 │ 98.04ms │ 98.17ms │ no change │
│ QQuery 20 │ 55.19ms │ 53.82ms │ no change │
│ QQuery 21 │ 128.41ms │ 120.85ms │ +1.06x faster │
│ QQuery 22 │ 38.39ms │ 37.76ms │ no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ projection_filter_exec ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1 │ 91.51ms │ 89.51ms │ no change │
│ QQuery 2 │ 25.79ms │ 22.87ms │ +1.13x faster │
│ QQuery 3 │ 53.71ms │ 41.13ms │ +1.31x faster │
│ QQuery 4 │ 51.74ms │ 28.65ms │ +1.81x faster │
│ QQuery 5 │ 115.24ms │ 84.50ms │ +1.36x faster │
│ QQuery 6 │ 9.28ms │ 8.36ms │ +1.11x faster │
│ QQuery 7 │ 215.76ms │ 212.17ms │ no change │
│ QQuery 8 │ 59.93ms │ 60.06ms │ no change │
│ QQuery 9 │ 58.50ms │ 57.97ms │ no change │
│ QQuery 10 │ 115.36ms │ 79.37ms │ +1.45x faster │
│ QQuery 11 │ 19.34ms │ 19.77ms │ no change │
│ QQuery 12 │ 59.47ms │ 36.31ms │ +1.64x faster │
│ QQuery 13 │ 54.12ms │ 43.69ms │ +1.24x faster │
│ QQuery 14 │ 18.04ms │ 12.17ms │ +1.48x faster │
│ QQuery 15 │ 58.53ms │ 38.91ms │ +1.50x faster │
│ QQuery 16 │ 21.86ms │ 21.94ms │ no change │
│ QQuery 17 │ 53.52ms │ 53.08ms │ no change │
│ QQuery 18 │ 156.56ms │ 145.73ms │ +1.07x faster │
│ QQuery 19 │ 35.22ms │ 28.24ms │ +1.25x faster │
│ QQuery 20 │ 63.21ms │ 50.03ms │ +1.26x faster │
│ QQuery 21 │ 249.52ms │ 173.98ms │ +1.43x faster │
│ QQuery 22 │ 14.70ms │ 14.32ms │ no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘
As adding it to the logicalplan seems to cause a lot of trouble, I plan on moving this to the physical plan optimization phase only.
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.