narwhals patch: `sum_horizontal` ignore nulls

What type of PR is this? (check all applicable)

[ ] 💾 Refactor
[ ] ✨ Feature
[x] 🐛 Bug Fix
[ ] 🔧 Optimization
[ ] 📝 Documentation
[ ] ✅ Test
[ ] 🐳 Other

Related issues

Closes #822

Checklist

[x] Code follows style guide (ruff)
[x] Tests added
[ ] Documented the changes

If you have comments or can explain your changes, please do so below.

Not sure how to handle non numeric 🤔

In horizontal sum, polars concatenates strings and fill nulls with empty strings, but also concatenates strings with numeric:

df = pl.DataFrame({
    "a": [1, 8, 3],
   "c": ["x", "y", "z"],
})

df.select(pl.sum_horizontal("a", "c"))

shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ str │
╞═════╡
│ 1x  │
│ 8y  │
│ 3z  │
└─────┘

To extract the dtype in narwhals, we would need to evaluate the expr, or am I missing something trivial?

Aug 20 '24 15:08 FBruzzesi

Not sure how to handle non numeric 🤔

probably fine to ignore them for now? or even, ignore them completely. we could be a bit stricter than Polars here, and guide people towards https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.concat_str.html to concatenate strings

Aug 22 '24 10:08 MarcoGorelli

probably fine to ignore them for now? or even, ignore them completely. we could be a bit stricter than Polars here, and guide people towards https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.concat_str.html to concatenate strings

Oh neat, then maybe we want to support concat_str first. As the request for null values came from me, I don't feel like we are in a rush to merge this.

Aug 22 '24 10:08 FBruzzesi

then maybe we want to support concat_str first

Well it turns out that I am facing one of the very painful pandas issue:

import pandas as pd

df = pd.DataFrame({"c1": [1,2,3], "c2": ["x", "y", None], "c3": list("abc")})

df["c2"] + df["c3"]

0     xa
1     yb
2    NaN
dtype: object

but to replicate polars behavior I was casting to string first, and we need nullable string type, otherwise:

df["c1"].astype(str) + df["c2"].astype(str)

0       1x
1       2y
2    3None
dtype: object

I would imagine same issue will present with dask

Aug 23 '24 12:08 FBruzzesi

This would official be a regression. I searched on github if anyone is using the feature on a public repo and ended up empty handed, which is a good sign.

I also added a warning in the docs as the following:

I am moving it to ready for review, but very happy to see if anyone has ideas on how to deal in case of strings

Aug 23 '24 12:08 FBruzzesi