explorer icon indicating copy to clipboard operation
explorer copied to clipboard

Support operations with groups in the Lazy Polars backend

Open philss opened this issue 2 years ago • 5 comments

The lazy frame backend that we implemented in #227 does not take into account groups. This issue is to track the implementation of groups for this backend.

Single table verbs

  • [x] group
  • [x] ungroup
  • [ ] mutate
  • [ ] arrange
  • [ ] filter
  • [ ] pivot_longer (maybe this won't change the backend)

Row-based verbs

  • [ ] head
  • [ ] tail
  • [ ] slice

philss avatar Feb 08 '23 19:02 philss

@philss what's the status on this one? I'm keen to contribute a bit more actively again.

cigrainger avatar Oct 23 '23 12:10 cigrainger

@cigrainger hey Chris! Sorry for not answering before. I was mostly offline today.

I couldn't figure out how to perform those operations on Lazyframes, because Polars represents grouped lazy frames in a different way, with a different struct - https://docs.rs/polars/latest/polars/prelude/struct.LazyFrame.html#method.group_by. So if we want to apply operations in groups, we would have to group + agg for every operation, and I don't know how this would work if we want to "ungroup" a lazy frame.

I don't remember the full picture, but I think the problem was related to "when" to apply the operations, since they couldn't - AFAIK - be "reverted"/ungrouped in the lazy frame.

philss avatar Oct 24 '23 03:10 philss

If I remember correctly, Polars does not support all of our lazy grouped operations in their lazy backend. So we either need to submit PRs to Polars (very hard) or change our implementation (supposedly easier).

Today the lazy backend simply dispatch to Polars. The idea is to rewrite it to have it collect the operations on the Elixir side. Then when we collect, we call the Polars API and immediately collect it too.

So I think the first step is to keep the current features but rewrite the lazy backend to build a list of operations on Elixir land. Then we can implement group/ungroup without major changes.

This is from memory, I am not 100% confident.

josevalim avatar Oct 24 '23 07:10 josevalim

I think that's exactly where we ended up and it doesn't seem too crazy. How would we represent the accumulated ops, a list of MFAs? 😱

cigrainger avatar Oct 24 '23 08:10 cigrainger

Yes, a list as a stack. :)

josevalim avatar Oct 24 '23 12:10 josevalim

This was completed by #890

There is one operation remaining that is sample/3. I think it is possible to implement that when using groups, but I'm not sure of that yet. So I should open a new issue in case I think it's feasible.

philss avatar Apr 02 '24 03:04 philss