modin icon indicating copy to clipboard operation
modin copied to clipboard

FIX-#4660: Fix `fillna` when Modin series object is an argument

Open anmyachev opened this issue 2 years ago • 2 comments

Signed-off-by: Myachev [email protected]

What do these changes do?

  • [x] commit message follows format outlined here
  • [x] passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • [x] passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • [x] signed commit with git commit -s
  • [x] Resolves #4660
  • [x] tests added and passing
  • [x] module layout described at docs/development/architecture.rst is up-to-date
  • [x] added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

anmyachev avatar Jul 13 '22 12:07 anmyachev

Codecov Report

Merging #4674 (6ea4940) into master (9b33451) will increase coverage by 3.09%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4674      +/-   ##
==========================================
+ Coverage   86.57%   89.67%   +3.09%     
==========================================
  Files         230      231       +1     
  Lines       18467    18849     +382     
==========================================
+ Hits        15988    16903     +915     
+ Misses       2479     1946     -533     
Impacted Files Coverage Δ
modin/core/dataframe/pandas/dataframe/dataframe.py 95.20% <100.00%> (+0.01%) :arrow_up:
...odin/core/storage_formats/pandas/query_compiler.py 96.09% <100.00%> (+<0.01%) :arrow_up:
...s/pandas_on_dask/partitioning/virtual_partition.py 85.98% <0.00%> (-8.90%) :arrow_down:
...lementations/pandas_on_dask/dataframe/dataframe.py 95.83% <0.00%> (-4.17%) :arrow_down:
modin/pandas/utils.py 92.40% <0.00%> (-1.94%) :arrow_down:
modin/core/execution/ray/common/utils.py 95.23% <0.00%> (-1.64%) :arrow_down:
...ns/pandas_on_ray/partitioning/partition_manager.py 82.19% <0.00%> (-1.30%) :arrow_down:
...s/pandas_on_dask/partitioning/partition_manager.py 100.00% <0.00%> (ø)
modin/experimental/batch/test/test_pipeline.py 100.00% <0.00%> (ø)
... and 23 more

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

codecov[bot] avatar Jul 13 '22 12:07 codecov[bot]

I now wonder what is the reason for not reindexing in the case of df1["c"].fillna(df2)? What if partition boundaries do not match for df1 and df2? What happens if df1.index does not match df2.index, how would .fillna() work in that case?

The case df1["c"].fillna(df2) is not possible, because value for Series.fillna must be a scalar, dict or Series, but not DataFrame:

import pandas as pd
# import modin.pandas as pd

df = pd.DataFrame({'a': ['a'], 'b': ['b'],}, index=['row1'])
df['c'] = pd.NA

df2_0 = pd.DataFrame({"a": [0], "b": [5]}, index=['row1'])
df2_1 = pd.DataFrame({"c": ["c"]}, index=['row1'])
df2 = pd.concat([df2_0, df2_1], axis=1)
df = df["c"].fillna(df2)
print(df)
Traceback (most recent call last):
  File "test_fillna.py", line 12, in <module>
    df = df["c"].fillna(df2)
  File "C:\Users\79049\.conda\envs\modin\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\79049\.conda\envs\modin\lib\site-packages\pandas\core\series.py", line 4908, in fillna
    return super().fillna(
  File "C:\Users\79049\.conda\envs\modin\lib\site-packages\pandas\core\generic.py", line 6461, in fillna
    raise TypeError(
TypeError: "value" parameter must be a scalar, dict or Series, but you passed a "DataFrame"

prutskov avatar Jul 22 '22 12:07 prutskov