explorer icon indicating copy to clipboard operation
explorer copied to clipboard

Introduce `_with` APIs

Open josevalim opened this issue 1 year ago • 1 comments

The goal is to introduce filter_with, summarize_with, mutate_with, arrange_with, and distinct_with.

Attack plan

  • [x] Support filter_with with row-based series operations
  • [x] Support summarize_with with aggregation-based series operations
  • [x] Support mutate_with with row, group, and aggregation-based series operations
  • [ ] Support arrange_with
  • [ ] Support distinct_with
  • [ ] Decide on #224

This will unblock us to fully tackle #223, #227, and #245.

Complications

arrange/distinct introduce one particular issue. We have added the _with prefix to disambiguate the macro-api from the non-macro API. This was easy because the non-macro API for mutate/summarize/filter are function based. However, arrange/distinct already have a non-macro API that is not function based, for example:

arrange(df, desc: "my_field")

But we also want to support this:

arrange(df, desc: my_field)

We have three choices:

  • Keep arrange(df, desc: "my_field") and arrange(df, desc: my_field), under the same function/arity. This may be doable but it may also raise ambiguities. For example, should we allow arrange(df, desc: my_field, asc: "another-field")?

  • Move the non-macro API to arrange_with, which will support keywords or functions, such as arrange_with(df, desc: "my_field")

  • Remove the arrange(df, desc: "my_field") version. People can either use arrange(df, desc: my_field) or arrange_with(df, fn df -> [desc: df["my_field"]] end)

EDIT: distinct has further complications, because the columns are passed as options and we will have to revisit that.

josevalim avatar Jul 06 '22 10:07 josevalim

@josevalim I'm going to start summarize_with operations. I believe filter_with can be considered done. WDYT? cc/ @cigrainger

philss avatar Jul 21 '22 03:07 philss