mars
mars copied to clipboard
[BUG] sort_values failed after using dropna
Describe the bug when I try to use sort_values(ignore_index=True) after dropna, it raises TypeError:
a = md.Series([1,3,2,np.nan,np.nan])
a.dropna().sort_values(ignore_index=True).execute()
but I can do it in pandas:
b = pd.Series([1, 3, 2, np.nan, np.nan])
b.dropna().sort_values(ignore_index=True)
To Reproduce To help us reproducing this bug, please provide information below:
- Your Python version: 3.8.0
- The version of Mars you use: 0.6.11
- Versions of crucial packages, such as numpy, scipy and pandas: pandas: 1.1.3
- Full stack of the error.
ValueError Traceback (most recent call last)
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\dtypes\common.py in ensure_python_int(value)
170 try:
--> 171 new_value = int(value)
172 assert new_value == value
ValueError: cannot convert float NaN to integer
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-18-f7e878c753c1> in <module>
1 a = md.Series([1,3,2,np.nan,np.nan])
----> 2 a.dropna().sort_values(ignore_index=True).execute()
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\dataframe\sort\sort_values.py in series_sort_values(series, axis, ascending, inplace, kind, na_position, ignore_index, parallel_kind, psrs_kinds)
317 parallel_kind=parallel_kind, psrs_kinds=psrs_kinds,
318 output_types=[OutputType.series], gpu=series.op.is_gpu())
--> 319 sorted_series = op(series)
320 if inplace:
321 series.data = sorted_series.data
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\utils.py in _inner(*args, **kwargs)
454 def _inner(*args, **kwargs):
455 with self:
--> 456 return func(*args, **kwargs)
457
458 return _inner
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\dataframe\sort\sort_values.py in __call__(self, a)
97 assert self.axis == 0
98 if self.ignore_index:
---> 99 index_value = parse_index(pd.RangeIndex(a.shape[0]))
100 else:
101 if isinstance(a.index_value.value, IndexValue.RangeIndex):
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\indexes\range.py in __new__(cls, start, stop, step, dtype, copy, name)
100 raise TypeError("RangeIndex(...) must be called with integers")
101
--> 102 start = ensure_python_int(start) if start is not None else 0
103
104 if stop is None:
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\dtypes\common.py in ensure_python_int(value)
172 assert new_value == value
173 except (TypeError, ValueError, AssertionError) as err:
--> 174 raise TypeError(f"Wrong type {type(value)} for value {value}") from err
175 return new_value
176
TypeError: Wrong type <class 'float'> for value nan
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
Please copy-paste your code and error message instead of screenshots.
Can be fixed by parsing pd.RangeIndex(-1) when size of certain dimension is unknown.
Hello :) I'm a beginner to open source and I'd like to resolve this issue. Is it still relevant?
Hello :) I'm a beginner to open source and I'd like to resolve this issue. Is it still relevant?
Super welcome, you can try to fix this, feel free to ask question if you encounter any issue.
Hello. I'm new to the open source pull request thing, but I've forked and sent out a pull request at https://github.com/mars-project/mars/pull/3363
I would note that running black as suggested for linting also edited mars/learn/contrib/lightgbm/tests/test_classifier.py.
Edits at a glance: mars\dataframe\sort\sort_values.py Lines 111 - 114 From:
def __call__(self, a):
assert self.axis == 0
if self.ignore_index:
index_value = parse_index(pd.RangeIndex(a.shape[0]))
else:
if isinstance(a.index_value.value, IndexValue.RangeIndex):
index_value = parse_index(pd.Index([], dtype=np.int64))
else:
index_value = a.index_value
-snip-
To:
def __call__(self, a):
assert self.axis == 0
if self.ignore_index:
if type(a.shape[0]) != int:
index_value = parse_index(pd.RangeIndex(-1))
else:
index_value = parse_index(pd.RangeIndex(a.shape[0]))
else:
if isinstance(a.index_value.value, IndexValue.RangeIndex):
index_value = parse_index(pd.Index([], dtype=np.int64))
else:
index_value = a.index_value
-snip-
Gist - Code to recreate problem + some notes (since it's an old issue) https://gist.github.com/Shaun2h/cf294782c840eaa1223caf2e4ad5bfd0