Add IR-fusion optimization pass to streaming cudf-polars executor
Description
Fuses together sequential IR nodes so that a single compute task can be used for each partition. This optimization was very helpful in dask-cudf/dask-expr. It should also reduce memory pressure for multi-GPU polars.
Checklist
- [ ] I am familiar with the Contributing Guidelines.
- [ ] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.
Contributors can view more details about this message here.
/ok to test
/ok to test
/ok to test
Is this PR just waiting on reviews? If so, I can take a look but also it looks like we've had a couple of reviewers do some thorough passes already so maybe just ping for those reviews so that we make sure it gets over the finish line.
Is this PR just waiting on reviews?
That's partially the case. However, I haven't observed any benchmarking benefit from this change. In fact, I seem to have more OOM issues with fusion turned on. So, I'm feeling a bit hesitant to make a change that doesn't provide a performance or stability benefit...
Agreed, I just wasn't sure where we were. I wouldn't want to merge additional complexity for no benefit either.
@rjzamora if we still don't have evidence that this change is a positive, should we close the PR? I've moved it to 25.12 for now.
Thanks for moving the PR @vyasr - The changes are quite stale, so I'll close for now.