polars icon indicating copy to clipboard operation
polars copied to clipboard

`struct.rename_fields` enhancements: correct name count & dict input

Open Julian-J-S opened this issue 1 year ago • 6 comments

Problem description

Adjusting struct field names currently is a little weird with rename_fields

Length of names parameter

rating_Series = pl.Series(
    "ratings",
    [
        {"Movie": "Cars", "Theatre": "NE", "Avg_Rating": 4.5},
        {"Movie": "Toy Story", "Theatre": "ME", "Avg_Rating": 4.9},
    ],
)

# Start
rating_Series.struct.unnest()
┌───────────┬─────────┬────────────┐
│ Movie     ┆ Theatre ┆ Avg_Rating │
│ ---       ┆ ---     ┆ ---        │
│ str       ┆ str     ┆ f64        │
╞═══════════╪═════════╪════════════╡
│ Cars      ┆ NE      ┆ 4.5        │
│ Toy Story ┆ ME      ┆ 4.9        │
└───────────┴─────────┴────────────┘

# Too many names
rating_Series
.struct.rename_fields(names=['Film', 'State', 'Value', 'hello', 'world'])
.struct.unnest()
┌───────────┬───────┬───────┐
│ Film      ┆ State ┆ Value │
│ ---       ┆ ---   ┆ ---   │
│ str       ┆ str   ┆ f64   │
╞═══════════╪═══════╪═══════╡
│ Cars      ┆ NE    ┆ 4.5   │
│ Toy Story ┆ ME    ┆ 4.9   │
└───────────┴───────┴───────┘

# Too few
rating_Series
.struct.rename_fields(names=['Film'])
.struct.unnest()
┌───────────┐
│ Film      │
│ ---       │
│ str       │
╞═══════════╡
│ Cars      │
│ Toy Story │
└───────────┘

To discuss:

  • too many names provided
    • no effect, additional names will be ignored
    • should this be allowed?
  • too few names provided:
    • missing columns will be dropped
    • is this intended?

Comparison to Dataframe columns:

  • df.columns = [...]
  • crash if too many/few names provided with: ShapeError: X column names provided for a dataframe of width Y

Add option to provide a mapping to adjust only selected names

Example: rename_fields({'Movie': 'Film', Theatre': 'State'})

Julian-J-S avatar Aug 29 '23 14:08 Julian-J-S

Too few should error: https://github.com/pola-rs/polars/issues/9052#issuecomment-1564253746

Too few names dropping missing columns is not intended: https://github.com/pola-rs/polars/issues/9052#issuecomment-1564253746

cmdlineluser avatar Aug 29 '23 17:08 cmdlineluser

Too few should error: #9052 (comment)

Why though? A normal rename can do partial renames, shouldn't struct.field_renames behave similarly and keep the other fields but not renamed when no mapping has been passed.

ion-elgreco avatar Aug 29 '23 18:08 ion-elgreco

Too few should error: #9052 (comment)

Why though? A normal rename can do partial renames, shouldn't struct.field_renames behave similarly and keep the other fields but not renamed when no mapping has been passed.

It seems the balance is between there being a use case for wanting to rename the first n fields positionally vs simply accidentally feeding too few arguments to the rename.

I know I'm much more likely to be in the latter camp than the former. Additionally, if you are in the former camp and get an error here, you'll know how to address it.

deanm0000 avatar Aug 31 '23 13:08 deanm0000

Would be great to have rename_fields accept a dict.

DGolubets avatar May 01 '24 16:05 DGolubets

@DGolubets .name.map_fields() has since been added which can help if you're using frames.

df = rating_Series.to_frame()

df.schema["ratings"]
# Struct({'Movie': String, 'Theatre': String, 'Avg_Rating': Float64})

df.with_columns(
   pl.col("ratings").name.map_fields(lambda f:
       {"Movie": "Film", "Theatre": "State"}.get(f, f)
   )
).schema["ratings"]
# Struct({'Film': String, 'State': String, 'Avg_Rating': Float64})

cmdlineluser avatar May 01 '24 16:05 cmdlineluser

@cmdlineluser Great!

DGolubets avatar May 01 '24 16:05 DGolubets

+1 on .rename_fields() supporting a dict argument

DeflateAwning avatar May 07 '24 18:05 DeflateAwning