modin icon indicating copy to clipboard operation
modin copied to clipboard

PERF: avoid reorder_labels

Open jbrockmendel opened this issue 3 years ago • 15 comments

  • [x] closes #4882
import numpy as np
import modin.pandas as pd

arr = np.random.randn(1_000_000, 20)
df = pd.DataFrame(arr)

res = df[[4, 3]]

%timeit res = df[[4, 3]]
711 µs ± 4.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <- PR
43.6 ms ± 1.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- master

TODO

  • [ ] further optimizations to blklocs/blknos
  • [ ] avoid reorder_labels in cudf_on_ray implemenation
  • [ ] fix is_full_axis_mask
  • [ ] deal with potential many-partitions issues (discussed briefly on call 8/23)

jbrockmendel avatar Aug 24 '22 18:08 jbrockmendel

This pull request introduces 3 alerts when merging 0ab5fbcb2429bddc0c93880f0957834d7edfeddd into b0465fa4425247e7949881e93866c6b85bb10a1d - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

lgtm-com[bot] avatar Aug 24 '22 19:08 lgtm-com[bot]

This pull request introduces 3 alerts when merging b5aede162bc26e4be3516301289e56de41de00bd into dff84b9cdf373096f88356fe17202599c192c2fc - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

lgtm-com[bot] avatar Aug 24 '22 21:08 lgtm-com[bot]

Looks like this runs into a many-partitions issue. @devin-petersohn on the call the other day it sounded like you had an idea to deal with this?

jbrockmendel avatar Aug 25 '22 17:08 jbrockmendel

Looks like this runs into a many-partitions issue. @devin-petersohn on the call the other day it sounded like you had an idea to deal with this?

Hi @jbrockmendel! By many-partitions issue, do you mean that too many partitions are created? If that's the issue, you could call the rebalance_partitions method of the partition manager on the partitions, which should balance them!

RehanSD avatar Aug 25 '22 17:08 RehanSD

This pull request introduces 3 alerts when merging 0143dc6cda908112e1d5306c169f2af507280e04 into 336fad7c26ad8d3f44fe290d232c7cb7fd7dbe17 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

lgtm-com[bot] avatar Aug 25 '22 23:08 lgtm-com[bot]

This pull request introduces 3 alerts when merging a563f21ad03263f5b38247c37ccb51e68004880e into 636fc59e6820a36937b72cfb96ba9fa60d871fe4 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

lgtm-com[bot] avatar Aug 26 '22 18:08 lgtm-com[bot]

Codecov Report

Merging #4880 (514f062) into master (c9fc326) will decrease coverage by 12.64%. The diff coverage is 100.00%.

:exclamation: Current head 514f062 differs from pull request most recent head ab42a25. Consider uploading reports for the commit ab42a25 to get more accurate results

@@             Coverage Diff             @@
##           master    #4880       +/-   ##
===========================================
- Coverage   84.87%   72.22%   -12.65%     
===========================================
  Files         268      268               
  Lines       19744    19739        -5     
===========================================
- Hits        16757    14256     -2501     
- Misses       2987     5483     +2496     
Impacted Files Coverage Δ
...in/core/dataframe/pandas/partitioning/partition.py 100.00% <ø> (ø)
modin/core/dataframe/pandas/dataframe/dataframe.py 80.80% <100.00%> (-14.30%) :arrow_down:
modin/experimental/sklearn/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
modin/experimental/xgboost/test/test_dmatrix.py 0.00% <0.00%> (-100.00%) :arrow_down:
modin/experimental/xgboost/test/test_xgboost.py 0.00% <0.00%> (-100.00%) :arrow_down:
modin/experimental/core/execution/ray/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
...n/experimental/sklearn/model_selection/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
...mental/sklearn/model_selection/train_test_split.py 0.00% <0.00%> (-100.00%) :arrow_down:
...tal/core/execution/ray/implementations/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
...pandas/interchange/dataframe_protocol/exception.py 0.00% <0.00%> (-100.00%) :arrow_down:
... and 78 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 26 '22 18:08 codecov[bot]

This pull request introduces 1 alert when merging 2809891e95f8a83f291bc1b54fa09c68cb9c028a into 636fc59e6820a36937b72cfb96ba9fa60d871fe4 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

lgtm-com[bot] avatar Aug 26 '22 19:08 lgtm-com[bot]

@RehanSD i think we need to re-find new_row_lengths after rebalance_partitions. suggestions on efficient way to do this?

jbrockmendel avatar Aug 29 '22 17:08 jbrockmendel

i think we need to re-find new_row_lengths after rebalance_partitions. suggestions on efficient way to do this?

@jbrockmendel it looks like draft PR #4893 is going to make rebalance_partitions return the new row lengths without doing any extra remote work. Will that work?

mvashishtha avatar Aug 29 '22 17:08 mvashishtha

sounds like it, yes

jbrockmendel avatar Aug 29 '22 17:08 jbrockmendel

@anmyachev tried to incorporate the rebalance_partition changes but it broke a bunch of new stuff. suggestions?

jbrockmendel avatar Aug 31 '22 21:08 jbrockmendel

@anmyachev tried to incorporate the rebalance_partition changes but it broke a bunch of new stuff. suggestions?

Perhaps https://github.com/modin-project/modin/issues/4914 will result in an issue with the insert operation.

anmyachev avatar Sep 01 '22 15:09 anmyachev

@jbrockmendel please rebase on master

anmyachev avatar Sep 01 '22 20:09 anmyachev

This pull request introduces 2 alerts when merging 68210152efa56ab50d80116c2e837c66e40562f4 into 84f638cc703beb550edecce715532c8e86fb7b25 - view on LGTM.com

new alerts:

  • 2 for Unused local variable

lgtm-com[bot] avatar Sep 02 '22 23:09 lgtm-com[bot]

I don't see a path here, closing.

jbrockmendel avatar Feb 01 '23 16:02 jbrockmendel