Supports list-like Python objects for Series comparison.
Currently Series doesn't support the comparison to list-like Python objects such as list, tuple, dict, set.
>>> kser
0 1
1 2
2 3
dtype: int64
>>> kser == [3, 2, 1]
Traceback (most recent call last):
...
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o77.equalTo.
...
This PR proposes supporting them as well for Series comparison.
>>> kser
0 1
1 2
2 3
dtype: int64
>>> kser == [3, 2, 1]
0 False
1 True
2 False
dtype: bool
This should resolve #2018
Found a bug:
>>> pser = pd.Series([1,2,3], index=[10,20,30])
>>> pser == [3, 2, 1]
10 False
20 True
30 False
dtype: bool
whereas:
>>> kser = ks.Series([1,2,3], index=[10,20,30])
>>> kser == [3, 2, 1]
0 False
1 False
10 False
2 False
30 False
20 False
dtype: bool
Codecov Report
Merging #2022 (44e34f6) into master (901cea6) will decrease coverage by
1.52%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #2022 +/- ##
==========================================
- Coverage 94.70% 93.18% -1.53%
==========================================
Files 54 54
Lines 11480 11393 -87
==========================================
- Hits 10872 10616 -256
- Misses 608 777 +169
| Impacted Files | Coverage Δ | |
|---|---|---|
| databricks/koalas/indexes/base.py | 97.62% <100.00%> (+0.19%) |
:arrow_up: |
| databricks/koalas/series.py | 96.72% <100.00%> (-0.07%) |
:arrow_down: |
| databricks/koalas/usage_logging/__init__.py | 27.58% <0.00%> (-64.92%) |
:arrow_down: |
| databricks/koalas/usage_logging/usage_logger.py | 47.82% <0.00%> (-52.18%) |
:arrow_down: |
| databricks/koalas/__init__.py | 82.66% <0.00%> (-8.38%) |
:arrow_down: |
| databricks/koalas/accessors.py | 86.43% <0.00%> (-7.04%) |
:arrow_down: |
| databricks/conftest.py | 93.22% <0.00%> (-6.78%) |
:arrow_down: |
| databricks/koalas/namespace.py | 79.91% <0.00%> (-4.50%) |
:arrow_down: |
| databricks/koalas/generic.py | 90.57% <0.00%> (-2.68%) |
:arrow_down: |
| databricks/koalas/typedef/typehints.py | 92.03% <0.00%> (-1.77%) |
:arrow_down: |
| ... and 21 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 901cea6...44e34f6. Read the comment docs.
Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon
>>> pser + [3, 2, 1]
10 4
20 4
30 4
dtype: int64
>>> pser - [3, 2, 1]
10 -2
20 0
30 2
dtype: int64
>>> [3, 2, 1] + pser
10 4
20 4
30 4
dtype: int64
FYI: Seems like pandas has some inconsistent behavior as below.
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> a.eq(b)
a True
b False
c False
d False
e False
dtype: bool
>>> a == b
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled Series objects
However, in their API doc for Series.eq, it says "Equivalent to series == other".
I posted question to pandas repo, and will share if they response.
Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon
>>> pser + [3, 2, 1] 10 4 20 4 30 4 dtype: int64 >>> pser - [3, 2, 1] 10 -2 20 0 30 2 dtype: int64 >>> [3, 2, 1] + pser 10 4 20 4 30 4 dtype: int64
Let me do this in the separated PR since there may be inconsistent cases like eq.
The eq case sounds different from the topic here which is binary operations between Series and list.
https://issues.apache.org/jira/browse/SPARK-36438