Support operations with groups in the Lazy Polars backend
The lazy frame backend that we implemented in #227 does not take into account groups. This issue is to track the implementation of groups for this backend.
Single table verbs
- [x]
group - [x]
ungroup - [ ]
mutate - [ ]
arrange - [ ]
filter - [ ]
pivot_longer(maybe this won't change the backend)
Row-based verbs
- [ ]
head - [ ]
tail - [ ]
slice
@philss what's the status on this one? I'm keen to contribute a bit more actively again.
@cigrainger hey Chris! Sorry for not answering before. I was mostly offline today.
I couldn't figure out how to perform those operations on Lazyframes, because Polars represents grouped lazy frames in a different way, with a different struct - https://docs.rs/polars/latest/polars/prelude/struct.LazyFrame.html#method.group_by. So if we want to apply operations in groups, we would have to group + agg for every operation, and I don't know how this would work if we want to "ungroup" a lazy frame.
I don't remember the full picture, but I think the problem was related to "when" to apply the operations, since they couldn't - AFAIK - be "reverted"/ungrouped in the lazy frame.
If I remember correctly, Polars does not support all of our lazy grouped operations in their lazy backend. So we either need to submit PRs to Polars (very hard) or change our implementation (supposedly easier).
Today the lazy backend simply dispatch to Polars. The idea is to rewrite it to have it collect the operations on the Elixir side. Then when we collect, we call the Polars API and immediately collect it too.
So I think the first step is to keep the current features but rewrite the lazy backend to build a list of operations on Elixir land. Then we can implement group/ungroup without major changes.
This is from memory, I am not 100% confident.
I think that's exactly where we ended up and it doesn't seem too crazy. How would we represent the accumulated ops, a list of MFAs? 😱
Yes, a list as a stack. :)
This was completed by #890
There is one operation remaining that is sample/3. I think it is possible to implement that when using groups, but I'm not sure of that yet. So I should open a new issue in case I think it's feasible.