ibis
ibis copied to clipboard
feat: `topk` table expression
Is your feature request related to a problem?
As described here, filtering a table to return the row(s) with largest value(s) in each group feels harder in Ibis than in pandas. I wonder if Ibis could add some syntactic sugar to make this easier.
Describe the solution you'd like
dplyr has a function top_n()
that makes this simpler syntactically:
df <- data.frame(
country = c('India', 'India', 'India', 'United States', 'United States', 'United States', 'China', 'China', 'China'),
city = c('Bangalore', 'Delhi', 'Mumbai', 'Los Angeles', 'New York', 'Chicago', 'Shanghai', 'Guangzhou', 'Beijing'),
population = c(8443675, 11034555, 12442373, 3820914, 8258035, 2664452, 24281400, 13858700, 19164000)
)
library(dplyr)
df |> group_by(country) |> top_n(1, wt = population)
I wonder if we could add something like that in Ibis? Ibis already has a topk
function, but it's a vector function, not a table function. Maybe Ibis could add a topk
table function that translates into an operation like this?
What version of ibis are you running?
9.1.0
What backend(s) are you using, if any?
DuckDB
Code of Conduct
- [X] I agree to follow this project's Code of Conduct