mars icon indicating copy to clipboard operation
mars copied to clipboard

[BUG] Mars build graph took too much time

Open chaokunyang opened this issue 2 years ago • 0 comments

Describe the bug When executing a blockwise operations in mars which have many setitem/getitem nodes, mars will take about 1 minutes, which is too long.

To Reproduce To help us reproducing this bug, please provide information below:

  1. Your Python version: 3.7.9

  2. The version of Mars you use: master

  3. Versions of crucial packages, such as numpy, scipy and pandas

  4. Full stack of the error. image

  5. Minimized code to reproduce the error.

import math
df = md.DataFrame(
    mt.random.rand(120_0000, 70, chunk_size=5000),
    columns=[f"col{i}" for i in range(70)])

for c in range(70):
    df[f"col{i+70}"] = df[f"col{i}"].fillna(0)
    df[f"col{i+140}"] = df[f"col{i}"].fillna(0)
for c in range(70):
    df[f"col{i}"] = df[f"col{i}"]/100
df=df.fillna(0)
cols=df.columns.to_pandas().values
df=df[cols[:-1]]
df=df.apply(lambda x: x, axis=1)
df = df.replace('NaN', np.nan)  # replace string NaN to numpy
df = df.replace(math.nan, np.nan)  # replace string NaN to numpy
df = df.fillna(value=np.nan)  # replace none, null to numpy
df.map_chunk(lambda x:x).execute()

Expected behavior The graph building time should be less than 3 seconds

chaokunyang avatar Jun 29 '22 09:06 chaokunyang