polars
polars copied to clipboard
`struct.rename_fields` enhancements: correct name count & dict input
Problem description
Adjusting struct field names currently is a little weird with rename_fields
Length of names
parameter
rating_Series = pl.Series(
"ratings",
[
{"Movie": "Cars", "Theatre": "NE", "Avg_Rating": 4.5},
{"Movie": "Toy Story", "Theatre": "ME", "Avg_Rating": 4.9},
],
)
# Start
rating_Series.struct.unnest()
┌───────────┬─────────┬────────────┐
│ Movie ┆ Theatre ┆ Avg_Rating │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞═══════════╪═════════╪════════════╡
│ Cars ┆ NE ┆ 4.5 │
│ Toy Story ┆ ME ┆ 4.9 │
└───────────┴─────────┴────────────┘
# Too many names
rating_Series
.struct.rename_fields(names=['Film', 'State', 'Value', 'hello', 'world'])
.struct.unnest()
┌───────────┬───────┬───────┐
│ Film ┆ State ┆ Value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞═══════════╪═══════╪═══════╡
│ Cars ┆ NE ┆ 4.5 │
│ Toy Story ┆ ME ┆ 4.9 │
└───────────┴───────┴───────┘
# Too few
rating_Series
.struct.rename_fields(names=['Film'])
.struct.unnest()
┌───────────┐
│ Film │
│ --- │
│ str │
╞═══════════╡
│ Cars │
│ Toy Story │
└───────────┘
To discuss:
- too many names provided
- no effect, additional names will be ignored
- should this be allowed?
- too few names provided:
- missing columns will be dropped
- is this intended?
Comparison to Dataframe columns:
-
df.columns = [...]
- crash if too many/few names provided with:
ShapeError: X column names provided for a dataframe of width Y
Add option to provide a mapping to adjust only selected names
Example: rename_fields({'Movie': 'Film', Theatre': 'State'})
Too few should error: https://github.com/pola-rs/polars/issues/9052#issuecomment-1564253746
Too few names dropping missing columns is not intended: https://github.com/pola-rs/polars/issues/9052#issuecomment-1564253746
Too few should error: #9052 (comment)
Why though? A normal rename can do partial renames, shouldn't struct.field_renames
behave similarly and keep the other fields but not renamed when no mapping has been passed.
Too few should error: #9052 (comment)
Why though? A normal rename can do partial renames, shouldn't
struct.field_renames
behave similarly and keep the other fields but not renamed when no mapping has been passed.
It seems the balance is between there being a use case for wanting to rename the first n fields positionally vs simply accidentally feeding too few arguments to the rename.
I know I'm much more likely to be in the latter camp than the former. Additionally, if you are in the former camp and get an error here, you'll know how to address it.
Would be great to have rename_fields
accept a dict.
@DGolubets .name.map_fields()
has since been added which can help if you're using frames.
df = rating_Series.to_frame()
df.schema["ratings"]
# Struct({'Movie': String, 'Theatre': String, 'Avg_Rating': Float64})
df.with_columns(
pl.col("ratings").name.map_fields(lambda f:
{"Movie": "Film", "Theatre": "State"}.get(f, f)
)
).schema["ratings"]
# Struct({'Film': String, 'State': String, 'Avg_Rating': Float64})
@cmdlineluser Great!
+1 on .rename_fields()
supporting a dict argument