modin icon indicating copy to clipboard operation
modin copied to clipboard

PERF: MinPartitionSize default should be set to 1 for performance

Open devin-petersohn opened this issue 3 years ago • 9 comments

https://github.com/modin-project/modin/blob/9782a027568d9ad16bf2c3dea434646cec5e4898/modin/config/envvars.py#L415

devin-petersohn avatar Jan 21 '22 11:01 devin-petersohn

What's the rationale for this?

mvashishtha avatar Jan 21 '22 16:01 mvashishtha

As it currently stands, the minimum number of columns in any partition is 32, that leads to limited parallelism. Often there are datasets with less than 32 columns, so we need to make sure we perform well on those datasets too.

devin-petersohn avatar Jan 21 '22 16:01 devin-petersohn

I've tried out a few column-wise reductions and aggregations with min partition size of 1. The results are mixed for Ray on the following dataframe of about 1 million rows and 16 columns:

import modin.pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(100, size=(2 ** 20, 16)))

With min partition size 32 df._query_compiler._modin_frame._partitions.shape was (16, 1). With min partition size 1, it was (16, 16).

Operation time (ms) for min size 32 time (ms) for min size 1
df.count() 15 120
df.apply(sum, axis=0) 1150 220
df.apply("cumsum", axis=0) 3 40

System information:

  • Operating System: macOS Big Sure 11.5.2
  • Memory: 16 GB 2667 MHz DDR4
  • Modin version : 0.12.0+77.ged2a7a4a
  • Python version: Python 3.9.9
  • Ray: version 1.9.2

Is there a better way to test performance for this change?

mvashishtha avatar Jan 21 '22 17:01 mvashishtha

@mvashishtha A couple of questions:

  • What kind of performance do you get as the number of rows increases?
  • Does anything change if you start the benchmark from a file?

devin-petersohn avatar Jan 21 '22 17:01 devin-petersohn

@devin-petersohn I think it might be better to call repr to make sure the async operations complete.

Here are results for a few sizes running the benchmarking in ipython:

Click to show results

On the original dataframe with dimensions (2 ** 20, 16) I get:

Operation time (ms) for min size 32 time (ms) for min size 1
repr(df.count()) 35 220
repr(df.apply(sum, axis=0)) 1090 260
repr(df.apply("cumsum", axis=0)) 1550 10740

On a dataframe of dimensions (2 ** 22, 16) I get:

Operation time (ms) for min size 32 time (ms) for min size 1
repr(df.count()) 49 230
repr(df.apply(sum, axis=0)) 4370 715
repr(df.apply("cumsum", axis=0)) 6060 44000

On a dataframe of dimensions (2 ** 23, 16) I get:

Operation time (ms) for min size 32 time (ms) for min size 1
repr(df.count()) 108 243
repr(df.apply(sum, axis=0)) 8747 1400
repr(df.apply("cumsum", axis=0)) 2430 87000

Here are the results running the benchmark from a file:

Click to show results.

On a dataframe of dimensions (2 ** 20, 16) I get:

Operation time (ms) for min size 32 time (ms) for min size 1
repr(df.count()) 53 207
repr(df.apply(sum, axis=0)) 1098 258
repr(df.apply("cumsum", axis=0)) 255 180

On a dataframe of dimensions (2 ** 22, 16) I get:

Operation time (ms) for min size 32 time (ms) for min size 1
repr(df.count()) 217 525
repr(df.apply(sum, axis=0)) 728 724
repr(df.apply("cumsum", axis=0)) 408 366

On a dataframe of dimensions (2 ** 23, 16) I get:

Operation time (ms) for min size 32 time (ms) for min size 1
repr(df.count()) 223 227
repr(df.apply(sum, axis=0)) 1409 1407
repr(df.apply("cumsum", axis=0)) 1941 3718

from benchmark code:

Show benchmark code
import modin.pandas as pd
import numpy as np
import time
from modin.config import envvars
import warnings

warnings.filterwarnings('ignore', module='modin')


def print_runtimes(f, title):
    times = []
    for _ in range(4):
        start = time.time()
        f()
        end = time.time()
        times.append(end - start)
    print(f'{title}: {times} with all except first averaging {sum(times[1:]) / (len(times) - 1)}')


for size in ( (2**20, 16), (2**22, 16), (2**23, 16)):
    counter = lambda : repr(df.count())
    summer = lambda : repr(df.apply(sum, axis=0))
    cumsummer = lambda : repr(df.apply("cumsum", axis=0))

    df = pd.DataFrame(np.random.randint(100, size=size))
    print_runtimes(counter, f'partition size 32 and shape {size}: count')
    print_runtimes(summer, f'partition size 32 and shape {size}: sum')
    print_runtimes(cumsummer, f'partition size 32 and shape {size}: cumsum')

    print('--------------\n')

    envvars.MinPartitionSize.put(1)
    df = pd.DataFrame(np.random.randint(100, size=size))
    print_runtimes(counter, f'partition size 1 and shape {size}: count')
    print_runtimes(summer, f'partition size 1 and shape {size}: sum')
    print_runtimes(cumsummer, f'partition size 1 and shape {size}: cumsum')

    print('---------------\n' * 5)

Does anything change if you start the benchmark from a file?

In ipython, at all sizes there's a large win for applying sum, a huge loss for applying cumsum, and a loss for count(). When running from a file, it looks like there are some wins for the apply functions for the two smallest datasets, but not for the largest dataset.

mvashishtha avatar Jan 21 '22 19:01 mvashishtha

@devin-petersohn I think it might be better to call repr to make sure the async operations complete.

Not sure, what overhead comes from repr. It would probably be better to enable BenchmarkMode.

from modin.config import BenchmarkMode
BenchmarkMode.put(True)

@devin-petersohn, what do you think on having separate configuration settings MinPartitionLen and MinPartitionWidth?

YarShev avatar Jan 26 '22 12:01 YarShev

@YarShev it makes sense. We don't want single-row partitions, but single-column partitions might be okay.

devin-petersohn avatar Jan 26 '22 14:01 devin-petersohn

@YarShev

Not sure, what overhead comes from repr. It would probably be better to enable BenchmarkMode.

when I enable benchmark mode I don't see any significant difference for the benchmark I linked with Ray on my Mac. I don't see it with either Ray or Dask on a Google Compute Engine Linux VM with specs:

  • Operating system: Ubuntu 20.04.3 LTS
  • 8 vCPU
  • 32 GB RAM

I think we should try another benchmark.

mvashishtha avatar Jan 26 '22 17:01 mvashishtha

@mvashishtha, could you play with MinPartitionLen and MinPartitionWidth settings?

YarShev avatar Jan 26 '22 17:01 YarShev