mars
mars copied to clipboard
[BUG] Mars build graph took too much time
Describe the bug When executing a blockwise operations in mars which have many setitem/getitem nodes, mars will take about 1 minutes, which is too long.
To Reproduce To help us reproducing this bug, please provide information below:
-
Your Python version: 3.7.9
-
The version of Mars you use: master
-
Versions of crucial packages, such as numpy, scipy and pandas
-
Full stack of the error.
-
Minimized code to reproduce the error.
import math
df = md.DataFrame(
mt.random.rand(120_0000, 70, chunk_size=5000),
columns=[f"col{i}" for i in range(70)])
for c in range(70):
df[f"col{i+70}"] = df[f"col{i}"].fillna(0)
df[f"col{i+140}"] = df[f"col{i}"].fillna(0)
for c in range(70):
df[f"col{i}"] = df[f"col{i}"]/100
df=df.fillna(0)
cols=df.columns.to_pandas().values
df=df[cols[:-1]]
df=df.apply(lambda x: x, axis=1)
df = df.replace('NaN', np.nan) # replace string NaN to numpy
df = df.replace(math.nan, np.nan) # replace string NaN to numpy
df = df.fillna(value=np.nan) # replace none, null to numpy
df.map_chunk(lambda x:x).execute()
Expected behavior The graph building time should be less than 3 seconds