Support streaming: true on collect
https://github.com/pola-rs/polars/issues/3397#issuecomment-1341188319
Huge +1 for this. I was running into an issue the other day where a sequence of joins on lazy DataFrames was loading way more data than I thought it would into memory, and I'm thinking that streaming might help alleviate this issue.
Just to make sure I understand correctly: would this mean being able to perform computations (derived columns, typically) on a DataFrame that doesn’t fully fit in RAM (i.e. streamed from disk, with roughly linear memory cost), and then also stream the resulting output row by row — for example to write it back to disk, still keeping the memory footprint linear?