polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(rust, python): Add `top_k` and `bottom_k` in the `GroupBy` namespace

Open CanglongCl opened this issue 1 year ago • 1 comments

Closes #10054

Return the k top/bottom rows sorted by given order in each group.

Example

Rust

Only implemented for LazyGroupBy.

let df = df![
    "a" => &[1, 2, 2, 3, 4, 5],
    "b" => &[5.5, 0.5, 4.0, 10.0, 13.0, 17.0],
    "c" => &[true, true, true, false, false, true],
    "d" => &["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"],
].unwrap();

println!(
    "{:?}",
    df.lazy().group_by_stable(&[col("d")])
        .bottom_k(2, &[col("b")], [true])
        .collect()
        .unwrap()
);

Output:

shape: (5, 4)
┌────────┬─────┬──────┬───────┐
│ d      ┆ a   ┆ b    ┆ c     │
│ ---    ┆ --- ┆ ---  ┆ ---   │
│ str    ┆ i32 ┆ f64  ┆ bool  │
╞════════╪═════╪══════╪═══════╡
│ Apple  ┆ 3   ┆ 10.0 ┆ false │
│ Apple  ┆ 1   ┆ 5.5  ┆ true  │
│ Orange ┆ 2   ┆ 0.5  ┆ true  │
│ Banana ┆ 5   ┆ 17.0 ┆ true  │
│ Banana ┆ 4   ┆ 13.0 ┆ false │
└────────┴─────┴──────┴───────┘

Python

Implemented for both LazyGroupBy and GroupBy.

df = pl.DataFrame(
    {
        "a": [1, 2, 2, 3, 4, 5],
        "b": [5.5, 0.5, 4, 10, 13, 17],
        "c": [True, True, True, False, False, True],
        "d": ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"],
    }
)

df.group_by("d", maintain_order=True).bottom_k(2, by="b")

Output:

shape: (5, 4)
┌────────┬─────┬──────┬───────┐
│ d      ┆ a   ┆ b    ┆ c     │
│ ---    ┆ --- ┆ ---  ┆ ---   │
│ str    ┆ i64 ┆ f64  ┆ bool  │
╞════════╪═════╪══════╪═══════╡
│ Apple  ┆ 2   ┆ 4.0  ┆ true  │
│ Apple  ┆ 1   ┆ 5.5  ┆ true  │
│ Orange ┆ 2   ┆ 0.5  ┆ true  │
│ Banana ┆ 4   ┆ 13.0 ┆ false │
│ Banana ┆ 5   ┆ 17.0 ┆ true  │
└────────┴─────┴──────┴───────┘

CanglongCl avatar Mar 24 '24 07:03 CanglongCl

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.33%. Comparing base (2fca551) to head (eab8cea). Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15263      +/-   ##
==========================================
+ Coverage   81.31%   81.33%   +0.01%     
==========================================
  Files        1359     1359              
  Lines      176083   176163      +80     
  Branches     2524     2536      +12     
==========================================
+ Hits       143188   143280      +92     
+ Misses      32411    32399      -12     
  Partials      484      484              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 24 '24 08:03 codecov[bot]

Sorry for the confusion. See comment: https://github.com/pola-rs/polars/issues/10054#issuecomment-2025127965

ritchie46 avatar Mar 28 '24 13:03 ritchie46