Daft perf: Split projections after optimizer passes

Summary

There are currently three cases where we split projections:

when extracting actor pool projects
when extracting monotonically increasing ids
when extracting window functions

In these cases we create new logical type nodes. However, we don't currently consider these node types during projection, filter, and limit pushdowns. Thus these nodes act as optimization barriers.

We can instead delay splitting projections until after optimization passes. This would allow pushdowns to work in the presence of these operators without the need for adding new cases into our pushdown optimizations.

Related Issues

Closes #3991 and #3993

Checklist

[x] All tests have passed

Mar 15 '25 02:03 desmondcheongzx

Codecov Report

Attention: Patch coverage is 95.31250% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.74%. Comparing base (6b13716) to head (43fc4a7). Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...rc/daft-logical-plan/src/optimization/optimizer.rs	94.69%	6 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3992      +/-   ##
==========================================
+ Coverage   78.23%   78.74%   +0.51%     
==========================================
  Files         865      865              
  Lines      118581   118709     +128     
==========================================
+ Hits        92774    93481     +707     
+ Misses      25807    25228     -579

Files with missing lines	Coverage Δ
src/daft-dsl/src/functions/python/mod.rs	`96.75% <100.00%> (+0.28%)`	:arrow_up:
...rc/daft-logical-plan/src/optimization/optimizer.rs	`95.00% <94.69%> (-0.11%)`	:arrow_down:

... and 21 files with indirect coverage changes

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Mar 15 '25 02:03 codecov[bot]

@desmondcheongzx I think a better solution here might actually be to put the SplitActorPoolProjects rule at the end of the optimizer. Otherwise, we would have to add logic to many of our optimizer rules for actor pool projects, as well as other logical ops we may also add such as window, which will also be extracted out of a project.

In my mind there are two types of rewrite rules:

the ones that are relevant for other operations on the logical plan (e.g. unnest subquery is relevant to other optimization rules because it affects rules involving subqueries and joins)
the ones that are really only relevant for translation (e.g. SplitActorPoolProjects, because whether or not an actor pool UDF is inside of a Project or an ActorPoolProject only matters for physical plans)

For 1, we should do them before our optimization rules. For 2, we should do them after, with maybe a PushdownProject at the end to clean up some things.

Mar 17 '25 22:03 kevinzwang

@kevinzwang Got so busy with other things that I never got back to this. Just updated with your comments, lmk what you think!

Apr 01 '25 02:04 desmondcheongzx