Daft icon indicating copy to clipboard operation
Daft copied to clipboard

perf: Split projections after optimizer passes

Open desmondcheongzx opened this issue 9 months ago • 3 comments

Summary

There are currently three cases where we split projections:

  • when extracting actor pool projects
  • when extracting monotonically increasing ids
  • when extracting window functions

In these cases we create new logical type nodes. However, we don't currently consider these node types during projection, filter, and limit pushdowns. Thus these nodes act as optimization barriers.

We can instead delay splitting projections until after optimization passes. This would allow pushdowns to work in the presence of these operators without the need for adding new cases into our pushdown optimizations.

Related Issues

Closes #3991 and #3993

Checklist

  • [x] All tests have passed

desmondcheongzx avatar Mar 15 '25 02:03 desmondcheongzx

Codecov Report

Attention: Patch coverage is 95.31250% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.74%. Comparing base (6b13716) to head (43fc4a7). Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...rc/daft-logical-plan/src/optimization/optimizer.rs 94.69% 6 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3992      +/-   ##
==========================================
+ Coverage   78.23%   78.74%   +0.51%     
==========================================
  Files         865      865              
  Lines      118581   118709     +128     
==========================================
+ Hits        92774    93481     +707     
+ Misses      25807    25228     -579     
Files with missing lines Coverage Δ
src/daft-dsl/src/functions/python/mod.rs 96.75% <100.00%> (+0.28%) :arrow_up:
...rc/daft-logical-plan/src/optimization/optimizer.rs 95.00% <94.69%> (-0.11%) :arrow_down:

... and 21 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Mar 15 '25 02:03 codecov[bot]

@desmondcheongzx I think a better solution here might actually be to put the SplitActorPoolProjects rule at the end of the optimizer. Otherwise, we would have to add logic to many of our optimizer rules for actor pool projects, as well as other logical ops we may also add such as window, which will also be extracted out of a project.

In my mind there are two types of rewrite rules:

  1. the ones that are relevant for other operations on the logical plan (e.g. unnest subquery is relevant to other optimization rules because it affects rules involving subqueries and joins)
  2. the ones that are really only relevant for translation (e.g. SplitActorPoolProjects, because whether or not an actor pool UDF is inside of a Project or an ActorPoolProject only matters for physical plans)

For 1, we should do them before our optimization rules. For 2, we should do them after, with maybe a PushdownProject at the end to clean up some things.

kevinzwang avatar Mar 17 '25 22:03 kevinzwang

@kevinzwang Got so busy with other things that I never got back to this. Just updated with your comments, lmk what you think!

desmondcheongzx avatar Apr 01 '25 02:04 desmondcheongzx