polars
polars copied to clipboard
Change `Series.equals` to ignore name by default
Checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
df = pl.DataFrame(
{
"a": [1, 2, 3],
"b": [1, 2, 3]
}
)
s1 = df['a']
s2 = df['b']
# this fails unexpectedly
assert s1.equals(s2)
Log output
No response
Issue description
It looks like Series.equals requires the header names to be equal in order to return true. This surprised me, but I'm unsure what the intended behavior is. Pandas does not take the headers into account e.g.
import pandas as pd
df = pd.DataFrame(
{
"a": [1, 2, 3],
"b": [1, 2, 3]
}
)
s1 = df['a']
s2 = df['b']
# this passes as expected
assert s1.equals(s2)
I would have expected Polars to behave the same way.
If intended, it's probably worth clarifying this in the documentation.
Expected behavior
import polars as pl
df = pl.DataFrame(
{
"a": [1, 2, 3],
"b": [1, 2, 3]
}
)
s1 = df['a']
s2 = df['b']
# this should pass
assert s1.equals(s2)
Installed versions
--------Version info---------
Polars: 0.20.1
Index type: UInt32
Platform: macOS-14.1.1-arm64-arm-64bit
Python: 3.11.7 (main, Dec 19 2023, 23:11:51) [Clang 15.0.0 (clang-1500.0.40.1)]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: <not installed>
connectorx: 0.3.2
deltalake: <not installed>
fsspec: <not installed>
gevent: <not installed>
matplotlib: <not installed>
numpy: 1.26.2
openpyxl: <not installed>
pandas: 2.1.4
pyarrow: 14.0.2
pydantic: <not installed>
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: 2.0.23
xlsx2csv: <not installed>
xlsxwriter: <not installed>
Agreed - if you want to check whether the names are equal as well, you can test for a.name == b.name and a.equals(b).
There is also the polars.testing module which has several parameters for customizing behaviour.
https://pola-rs.github.io/polars/py-polars/html/reference/testing.html#asserts
>>> pl.testing.assert_series_equal(s1, s2)
# AssertionError: Series are different (name mismatch)
# [left]: a
# [right]: b
>>> pl.testing.assert_series_equal(s1, s2, check_names=False)
>>> print("Equal!")
# Equal!
Thanks for the link! I wouldn't want to depend on testing utils in my application though.
It's not a bug. But I also think it makes more sense for equals to ignore the name by default. I think this is worth an update. We can include a check_name parameter that defaults to False. That way, we can deprecate before we break..
@stinodego To preserve existing behavior and avoid a breaking change, shouldn't the default check_name be True? Depends on your versioning strategy I suppose. I can probably put together a PR for this.
To do this properly, you would add a parameter check_name which defaults to None. And in the function body, you check for None inputs, throw a deprecation warning, and then set it to True.
Happy to review a PR for this 👍
Hello! Long time lurker, (potentially) first time contributor. This type of change sounds like a good first issue to me, but I see it doesn't have that label. @stinodego do you think this is a good first issue? If so, would you be willing to assign me to it (since I can't on my own)? Or should I just make the changes without being assigned.
I did some exploring and I have a question. Why is this operator's implementation located in a test utilities file? crates/polars-core/src/testing.rs equals_missing(). Also, what does _missing mean in the title?
@jjmarchewitz
.eq_missing() behaves differently with regards to "missing values" i.e. None/null
pl.Series([1, None, 3]).eq( pl.Series([1, 2, 3]) )
# shape: (3,)
# Series: '' [bool]
# [
# true
# null # <- null is propagated
# true
# ]
pl.Series([1, None, 3]).eq_missing( pl.Series([1, 2, 3]) )
# shape: (3,)
# Series: '' [bool]
# [
# true
# false
# true
# ]
pl.Series([1, None, 3]).eq_missing( pl.Series([1, None, 3]) )
# shape: (3,)
# Series: '' [bool]
# [
# true
# true
# true
# ]
There doesn't seem to be doc entries for the Series API, but: https://docs.pola.rs/docs/python/dev/reference/expressions/api/polars.Expr.eq_missing.html#polars.Expr.eq_missing should help.
It could be a good first issue, depends on your skillset!
Note that the change must be made in Rust. Deprecation must happen on the Python side.