perf: Split projections after optimizer passes
Summary
There are currently three cases where we split projections:
- when extracting actor pool projects
- when extracting monotonically increasing ids
- when extracting window functions
In these cases we create new logical type nodes. However, we don't currently consider these node types during projection, filter, and limit pushdowns. Thus these nodes act as optimization barriers.
We can instead delay splitting projections until after optimization passes. This would allow pushdowns to work in the presence of these operators without the need for adding new cases into our pushdown optimizations.
Related Issues
Closes #3991 and #3993
Checklist
- [x] All tests have passed
Codecov Report
Attention: Patch coverage is 95.31250% with 6 lines in your changes missing coverage. Please review.
Project coverage is 78.74%. Comparing base (
6b13716) to head (43fc4a7). Report is 2 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...rc/daft-logical-plan/src/optimization/optimizer.rs | 94.69% | 6 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #3992 +/- ##
==========================================
+ Coverage 78.23% 78.74% +0.51%
==========================================
Files 865 865
Lines 118581 118709 +128
==========================================
+ Hits 92774 93481 +707
+ Misses 25807 25228 -579
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/daft-dsl/src/functions/python/mod.rs | 96.75% <100.00%> (+0.28%) |
:arrow_up: |
| ...rc/daft-logical-plan/src/optimization/optimizer.rs | 95.00% <94.69%> (-0.11%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@desmondcheongzx I think a better solution here might actually be to put the SplitActorPoolProjects rule at the end of the optimizer. Otherwise, we would have to add logic to many of our optimizer rules for actor pool projects, as well as other logical ops we may also add such as window, which will also be extracted out of a project.
In my mind there are two types of rewrite rules:
- the ones that are relevant for other operations on the logical plan (e.g. unnest subquery is relevant to other optimization rules because it affects rules involving subqueries and joins)
- the ones that are really only relevant for translation (e.g. SplitActorPoolProjects, because whether or not an actor pool UDF is inside of a Project or an ActorPoolProject only matters for physical plans)
For 1, we should do them before our optimization rules. For 2, we should do them after, with maybe a PushdownProject at the end to clean up some things.
@kevinzwang Got so busy with other things that I never got back to this. Just updated with your comments, lmk what you think!