koalas icon indicating copy to clipboard operation
koalas copied to clipboard

Supports list-like Python objects for Series comparison.

Open itholic opened this issue 4 years ago • 7 comments

Currently Series doesn't support the comparison to list-like Python objects such as list, tuple, dict, set.

>>> kser
0    1
1    2
2    3
dtype: int64

>>> kser == [3, 2, 1]
Traceback (most recent call last):
...
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o77.equalTo.
...

This PR proposes supporting them as well for Series comparison.

>>> kser
0    1
1    2
2    3
dtype: int64

>>> kser == [3, 2, 1]
0    False
1     True
2    False
dtype: bool

This should resolve #2018

itholic avatar Jan 27 '21 03:01 itholic

Found a bug:

>>> pser = pd.Series([1,2,3], index=[10,20,30])
>>> pser == [3, 2, 1]
10    False
20     True
30    False
dtype: bool

whereas:

>>> kser = ks.Series([1,2,3], index=[10,20,30])
>>> kser == [3, 2, 1]
0     False
1     False
10    False
2     False
30    False
20    False
dtype: bool

ueshin avatar Jan 27 '21 03:01 ueshin

Codecov Report

Merging #2022 (44e34f6) into master (901cea6) will decrease coverage by 1.52%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2022      +/-   ##
==========================================
- Coverage   94.70%   93.18%   -1.53%     
==========================================
  Files          54       54              
  Lines       11480    11393      -87     
==========================================
- Hits        10872    10616     -256     
- Misses        608      777     +169     
Impacted Files Coverage Δ
databricks/koalas/indexes/base.py 97.62% <100.00%> (+0.19%) :arrow_up:
databricks/koalas/series.py 96.72% <100.00%> (-0.07%) :arrow_down:
databricks/koalas/usage_logging/__init__.py 27.58% <0.00%> (-64.92%) :arrow_down:
databricks/koalas/usage_logging/usage_logger.py 47.82% <0.00%> (-52.18%) :arrow_down:
databricks/koalas/__init__.py 82.66% <0.00%> (-8.38%) :arrow_down:
databricks/koalas/accessors.py 86.43% <0.00%> (-7.04%) :arrow_down:
databricks/conftest.py 93.22% <0.00%> (-6.78%) :arrow_down:
databricks/koalas/namespace.py 79.91% <0.00%> (-4.50%) :arrow_down:
databricks/koalas/generic.py 90.57% <0.00%> (-2.68%) :arrow_down:
databricks/koalas/typedef/typehints.py 92.03% <0.00%> (-1.77%) :arrow_down:
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 901cea6...44e34f6. Read the comment docs.

codecov-io avatar Jan 27 '21 03:01 codecov-io

Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon

>>> pser + [3, 2, 1]
10    4
20    4
30    4
dtype: int64
>>> pser - [3, 2, 1]
10   -2
20    0
30    2
dtype: int64
>>> [3, 2, 1] + pser
10    4
20    4
30    4
dtype: int64

ueshin avatar Jan 27 '21 04:01 ueshin

FYI: Seems like pandas has some inconsistent behavior as below.

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

>>> a.eq(b)
a     True
b    False
c    False
d    False
e    False
dtype: bool

>>> a == b
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled Series objects

However, in their API doc for Series.eq, it says "Equivalent to series == other".

I posted question to pandas repo, and will share if they response.

itholic avatar Feb 04 '21 01:02 itholic

Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon

>>> pser + [3, 2, 1]
10    4
20    4
30    4
dtype: int64
>>> pser - [3, 2, 1]
10   -2
20    0
30    2
dtype: int64
>>> [3, 2, 1] + pser
10    4
20    4
30    4
dtype: int64

Let me do this in the separated PR since there may be inconsistent cases like eq.

itholic avatar Feb 04 '21 02:02 itholic

The eq case sounds different from the topic here which is binary operations between Series and list.

ueshin avatar Feb 04 '21 06:02 ueshin

https://issues.apache.org/jira/browse/SPARK-36438

xinrong-meng avatar Aug 05 '21 21:08 xinrong-meng