ibis feat: `topk` table expression

feat: `topk` table expression

Open ianmcook opened this issue 7 months ago • 4 comments

Is your feature request related to a problem?

As described here, filtering a table to return the row(s) with largest value(s) in each group feels harder in Ibis than in pandas. I wonder if Ibis could add some syntactic sugar to make this easier.

Describe the solution you'd like

dplyr has a function top_n() that makes this simpler syntactically:

df <- data.frame(
  country = c('India', 'India', 'India', 'United States', 'United States', 'United States', 'China', 'China', 'China'),
  city = c('Bangalore', 'Delhi', 'Mumbai', 'Los Angeles', 'New York', 'Chicago', 'Shanghai', 'Guangzhou', 'Beijing'),
  population = c(8443675, 11034555, 12442373, 3820914, 8258035, 2664452, 24281400, 13858700, 19164000)
)

library(dplyr)

df |> group_by(country) |> top_n(1, wt = population)

I wonder if we could add something like that in Ibis? Ibis already has a topk function, but it's a vector function, not a table function. Maybe Ibis could add a topk table function that translates into an operation like this?

What version of ibis are you running?

9.1.0

What backend(s) are you using, if any?

DuckDB

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Jul 08 '24 18:07 ianmcook

ibis ibis copied to clipboard

feat: `topk` table expression

Is your feature request related to a problem?

Describe the solution you'd like

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

ibis
ibis copied to clipboard