heat
heat copied to clipboard
Fixed code that verifies lshape while using is_split
Description
The reuse of gshape array as the receiving buffer made the code a bit too difficult to understand. So, created a new array neighbour_shape
to hold, well...., the neighbour's shape.
The use of MPI.SUM
, to find if any process has encountered a mismatch in local shapes with its neighbour, leads to errors because of integer limitations. When two or more processes find a mismatch, the huge negative values are added together and I guess because of integer overflow, they disappear into the unknown and finally, we are left with a positive value. So, it does not create any errors when it should. This can be easily solved by using MPI.MIN
instead.
One more thing that I don't understand is, why gshape[i]
is allowed to be one less than lshape[i]
in line 405. I know there must be a reason for it. I just don't understand why.
Code snippet:
import heat as ht
import torch
rank = ht.communication.MPI_WORLD.rank
dim = rank+1
larray = [[0]*dim]*dim
print("Rank", rank, larray)
d = ht.array(larray, is_split=0)
# literally any operation that involves communication
# between processes will crash
# like this simple print statement
print(d)
Due Diligence
- [ ] All split configurations tested
- [ ] Multiple dtypes tested in relevant functions
- [ ] Documentation updated (if needed)
- [ ] Updated changelog.md under the title "Pending Additions"
Does this change modify the behaviour of other functions? If so, which?
no
skip ci
👇 Click on the image for a new way to code review
-
Make big changes easier — review code in small groups of related files
-
Know where to start — see the whole change at a glance
-
Take a code tour — explore the change with an interactive tour
-
Make comments and review — all fully sync’ed with github
Legend
Changes moved to a different branch. Refer https://github.com/helmholtz-analytics/heat/pull/1034