heat icon indicating copy to clipboard operation
heat copied to clipboard

Cut down memory requirements for same-split reshape where possible

Open ClaudiaComito opened this issue 3 years ago • 3 comments

Description

When reshaping distributed DNDarrays:

  • if new_split is the same as the original split, and
  • if distribution (lshapes) allows then reshape locally via pytorch, stitch local_reshaped tensors along split axis, and balance.

This allows us to bypass the memory-intensive implementation of the distributed reshape in many cases.

Example:

tracemalloc.start()
t_x = torch.arange(100000).reshape(10,-1,10)
x = ht.array(t_x, split=1)
current, peak = tracemalloc.get_traced_memory()
print(f"BEFORE RESHAPE: Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")

start = time.perf_counter()
t_x = t_x.reshape(10, -1)
end = time.perf_counter()
current, peak = tracemalloc.get_traced_memory()
print(f"after torch.reshape: Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")
print("torch.reshape takes ", (end-start), " seconds.")

start = time.perf_counter()
x = x.reshape(10, -1)
end = time.perf_counter()
current, peak = tracemalloc.get_traced_memory()
print(f"after ht.reshape: Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")
print("ht.reshape takes ", (end-start), " seconds.")

Results on master, 2 processes

[1,0]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003077MB <---
[1,0]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,0]<stdout>:torch.reshape takes  2.068399999988202e-05  seconds.
[1,1]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003105MB <---
[1,1]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,1]<stdout>:torch.reshape takes  2.2049000000023966e-05  seconds.
[1,1]<stdout>:after ht.reshape: Current memory usage is 0.372806MB; Peak was 0.383006MB <---
[1,1]<stdout>:ht.reshape takes  0.020710520999999815  seconds.
[1,0]<stdout>:after ht.reshape: Current memory usage is 0.372689MB; Peak was 0.382889MB <---
[1,0]<stdout>:ht.reshape takes  0.02076237900000022  seconds.

Results on enhancement/distributed_reshape_same_split, 2 processes:

[1,0]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003077MB  <---
[1,0]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,0]<stdout>:torch.reshape takes  1.6194000000080422e-05  seconds.
[1,1]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003105MB <---
[1,1]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,1]<stdout>:torch.reshape takes  1.3567999999963831e-05  seconds.
[1,0]<stdout>:after ht.reshape: Current memory usage is 0.010736MB; Peak was 0.012752MB <---
[1,0]<stdout>:ht.reshape takes  0.015495102000000038  seconds.
[1,1]<stdout>:after ht.reshape: Current memory usage is 0.010736MB; Peak was 0.01278MB <---
[1,1]<stdout>:ht.reshape takes  0.01551089800000005  seconds.

Issue/s addressed: #874

Changes proposed:

  • see above

Type of change

  • New feature (non-breaking change which adds functionality)

Due Diligence

  • [x] All split configurations tested
  • [x] Multiple dtypes tested in relevant functions
  • [x] Documentation updated (if needed)
  • [x] Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

ClaudiaComito avatar Sep 24 '21 13:09 ClaudiaComito

failures may be solved by #857 would need to merge to be certain

coquelin77 avatar Oct 08 '21 09:10 coquelin77

Codecov Report

Merging #873 (0f4fd60) into master (293d873) will decrease coverage by 7.63%. The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #873      +/-   ##
==========================================
- Coverage   95.50%   87.87%   -7.64%     
==========================================
  Files          64       64              
  Lines        9579     9588       +9     
==========================================
- Hits         9148     8425     -723     
- Misses        431     1163     +732     
Flag Coverage Δ
gpu 87.87% <60.00%> (-6.77%) :arrow_down:
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
heat/core/manipulations.py 92.51% <60.00%> (-6.44%) :arrow_down:
heat/optim/dp_optimizer.py 13.59% <0.00%> (-82.49%) :arrow_down:
heat/optim/utils.py 38.15% <0.00%> (-61.85%) :arrow_down:
heat/nn/data_parallel.py 75.17% <0.00%> (-19.32%) :arrow_down:
heat/spatial/distance.py 80.90% <0.00%> (-15.08%) :arrow_down:
heat/core/relational.py 91.04% <0.00%> (-8.96%) :arrow_down:
heat/core/linalg/qr.py 91.25% <0.00%> (-8.75%) :arrow_down:
heat/utils/data/partial_dataset.py 87.17% <0.00%> (-7.18%) :arrow_down:
heat/cluster/spectral.py 88.57% <0.00%> (-5.72%) :arrow_down:
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 293d873...0f4fd60. Read the comment docs.

codecov[bot] avatar Jan 20 '22 15:01 codecov[bot]

CodeSee Review Map:

Review these changes using an interactive CodeSee Map

Review in an interactive map

View more CodeSee Maps

Legend

CodeSee Map Legend

ghost avatar Apr 27 '22 07:04 ghost

superseded by #1125

ClaudiaComito avatar Mar 20 '23 11:03 ClaudiaComito