heat
heat copied to clipboard
Cut down memory requirements for same-split reshape where possible
Description
When reshaping distributed DNDarrays:
- if
new_split
is the same as the original split, and - if distribution (lshapes) allows
then reshape locally via pytorch, stitch
local_reshaped
tensors along split axis, and balance.
This allows us to bypass the memory-intensive implementation of the distributed reshape
in many cases.
Example:
tracemalloc.start()
t_x = torch.arange(100000).reshape(10,-1,10)
x = ht.array(t_x, split=1)
current, peak = tracemalloc.get_traced_memory()
print(f"BEFORE RESHAPE: Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")
start = time.perf_counter()
t_x = t_x.reshape(10, -1)
end = time.perf_counter()
current, peak = tracemalloc.get_traced_memory()
print(f"after torch.reshape: Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")
print("torch.reshape takes ", (end-start), " seconds.")
start = time.perf_counter()
x = x.reshape(10, -1)
end = time.perf_counter()
current, peak = tracemalloc.get_traced_memory()
print(f"after ht.reshape: Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")
print("ht.reshape takes ", (end-start), " seconds.")
Results on master
, 2 processes
[1,0]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003077MB <---
[1,0]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,0]<stdout>:torch.reshape takes 2.068399999988202e-05 seconds.
[1,1]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003105MB <---
[1,1]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,1]<stdout>:torch.reshape takes 2.2049000000023966e-05 seconds.
[1,1]<stdout>:after ht.reshape: Current memory usage is 0.372806MB; Peak was 0.383006MB <---
[1,1]<stdout>:ht.reshape takes 0.020710520999999815 seconds.
[1,0]<stdout>:after ht.reshape: Current memory usage is 0.372689MB; Peak was 0.382889MB <---
[1,0]<stdout>:ht.reshape takes 0.02076237900000022 seconds.
Results on enhancement/distributed_reshape_same_split
, 2 processes:
[1,0]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003077MB <---
[1,0]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,0]<stdout>:torch.reshape takes 1.6194000000080422e-05 seconds.
[1,1]<stdout>:BEFORE RESHAPE: Current memory usage is 0.002501MB; Peak was 0.003105MB <---
[1,1]<stdout>:after torch.reshape: Current memory usage is 0.003669MB; Peak was 0.004101MB <---
[1,1]<stdout>:torch.reshape takes 1.3567999999963831e-05 seconds.
[1,0]<stdout>:after ht.reshape: Current memory usage is 0.010736MB; Peak was 0.012752MB <---
[1,0]<stdout>:ht.reshape takes 0.015495102000000038 seconds.
[1,1]<stdout>:after ht.reshape: Current memory usage is 0.010736MB; Peak was 0.01278MB <---
[1,1]<stdout>:ht.reshape takes 0.01551089800000005 seconds.
Issue/s addressed: #874
Changes proposed:
- see above
Type of change
- New feature (non-breaking change which adds functionality)
Due Diligence
- [x] All split configurations tested
- [x] Multiple dtypes tested in relevant functions
- [x] Documentation updated (if needed)
- [x] Updated changelog.md under the title "Pending Additions"
Does this change modify the behaviour of other functions? If so, which?
no
failures may be solved by #857 would need to merge to be certain
Codecov Report
Merging #873 (0f4fd60) into master (293d873) will decrease coverage by
7.63%
. The diff coverage is60.00%
.
@@ Coverage Diff @@
## master #873 +/- ##
==========================================
- Coverage 95.50% 87.87% -7.64%
==========================================
Files 64 64
Lines 9579 9588 +9
==========================================
- Hits 9148 8425 -723
- Misses 431 1163 +732
Flag | Coverage Δ | |
---|---|---|
gpu | 87.87% <60.00%> (-6.77%) |
:arrow_down: |
unit | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
heat/core/manipulations.py | 92.51% <60.00%> (-6.44%) |
:arrow_down: |
heat/optim/dp_optimizer.py | 13.59% <0.00%> (-82.49%) |
:arrow_down: |
heat/optim/utils.py | 38.15% <0.00%> (-61.85%) |
:arrow_down: |
heat/nn/data_parallel.py | 75.17% <0.00%> (-19.32%) |
:arrow_down: |
heat/spatial/distance.py | 80.90% <0.00%> (-15.08%) |
:arrow_down: |
heat/core/relational.py | 91.04% <0.00%> (-8.96%) |
:arrow_down: |
heat/core/linalg/qr.py | 91.25% <0.00%> (-8.75%) |
:arrow_down: |
heat/utils/data/partial_dataset.py | 87.17% <0.00%> (-7.18%) |
:arrow_down: |
heat/cluster/spectral.py | 88.57% <0.00%> (-5.72%) |
:arrow_down: |
... and 12 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 293d873...0f4fd60. Read the comment docs.
superseded by #1125