polars Multi-output, multi-sink lazy polars

Description

Let's assume we have a compute graph:

A->B->C->D
and also 
C->E

How can I materialize B, E and D as output? Do I need to do it "the old way", step-by-step, procedurally or can Polars add support for "optimizing and executing together" so it's pipelined (CPU cache friendly) and not doing excessive rechunking/combining (or it's doing it on a separate thread)?

Aug 16 '24 12:08 alippai

Ah yeah, I have definitely wanted this feature before. It might be very complex for query optimization though in the current state of the engine. You can imagine it dramatically changing the physical plan in cases where your final artifact only depends on a subset of columns and your intermediaries depend on many others.

You can kinda achieve this with a polars.collect_all but I don't think CSE is done between the two plans, so in practice you're doing duplicate work still.

Aug 16 '24 19:08 kszlim

In theory you could maybe do it with results that are lists and structs of various sizes with various window functions that you chop up into multiple frames after materializing.