[Enh]: Add `list.get()` negative indices and out-of-bounds access
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
Looking at modern Polars Chapter 2.2 (https://kevinheavey.github.io/modern-polars/method_chaining.html#:~:text=extract_city_name_pl) for #1945, I noticed that narwhals is missing a method similar to polars expr.list.get() (https://docs.pola.rs/api/python/dev/reference/expressions/api/polars.Expr.list.get.html).
Please describe the purpose of the new feature or describe the problem to solve.
This feature allows users to get values by index in a sublist (expr.list).
Suggest a solution if possible.
No response
If you have tried alternatives, please describe them below.
No response
Additional information that may help us understand your needs.
Example with data from modern Polars:
import polars
data = [
{"OriginCityName": "Raleigh/Durham, NC", "DestCityName": "Chicago, IL"},
{"OriginCityName": "Seattle, WA", "DestCityName": "Chicago, IL"},
{"OriginCityName": "Indianapolis, IN", "DestCityName": "Dallas/Fort Worth, TX"},
{"OriginCityName": "Sarasota/Bradenton, FL", "DestCityName": "Newark, NJ"},
{"OriginCityName": "Chicago, IL", "DestCityName": "Moline, IL"},
{"OriginCityName": "Madison, WI", "DestCityName": "Washington, DC"},
{"OriginCityName": "West Palm Beach/Palm Beach, FL", "DestCityName": "Charlotte, NC"}
]
df_pl = pl.DataFrame(data)
In [26]: df_pl
Out[26]:
shape: (7, 2)
┌────────────────────────────────┬───────────────────────┐
│ OriginCityName ┆ DestCityName │
│ --- ┆ --- │
│ str ┆ str │
╞════════════════════════════════╪═══════════════════════╡
│ Raleigh/Durham, NC ┆ Chicago, IL │
│ Seattle, WA ┆ Chicago, IL │
│ Indianapolis, IN ┆ Dallas/Fort Worth, TX │
│ Sarasota/Bradenton, FL ┆ Newark, NJ │
│ Chicago, IL ┆ Moline, IL │
│ Madison, WI ┆ Washington, DC │
│ West Palm Beach/Palm Beach, FL ┆ Charlotte, NC │
└────────────────────────────────┴───────────────────────┘
cols = ["OriginCityName", "DestCityName"]
df = df_pl.select(pl.col(cols).str.split(",").list.get(0))
In [28]: print(df)
shape: (7, 2)
┌────────────────────────────┬───────────────────┐
│ OriginCityName ┆ DestCityName │
│ --- ┆ --- │
│ str ┆ str │
╞════════════════════════════╪═══════════════════╡
│ Raleigh/Durham ┆ Chicago │
│ Seattle ┆ Chicago │
│ Indianapolis ┆ Dallas/Fort Worth │
│ Sarasota/Bradenton ┆ Newark │
│ Chicago ┆ Moline │
│ Madison ┆ Washington │
│ West Palm Beach/Palm Beach ┆ Charlotte │
└────────────────────────────┴───────────────────┘
nice one - yeah, looks in-scope!
do you want to try to contribute it?
Yes! I can try do it as a modern Polars side quest :)
This would be for Series.list.get as well I assume?
This would be for
Series.list.getas well I assume?
Yes. I will change the title to reflect this.
Adding as a reminder.
TODO:
- [ ] Allow negative indexes (see comment)
- [ ] [Optional] Handle out of bound index - polars has even an option
null_on_oobto determine if it should raise or return null
[ ] Allow negative indexes
I think a step towards this could be factoring-out the pyarrow handling for this on ArrowSeries
As long as you have the size, it is do-able