heat icon indicating copy to clipboard operation
heat copied to clipboard

Fixed code that verifies lshape while using is_split

Open Mystic-Slice opened this issue 2 years ago • 1 comments

Description

The reuse of gshape array as the receiving buffer made the code a bit too difficult to understand. So, created a new array neighbour_shape to hold, well...., the neighbour's shape.

The use of MPI.SUM, to find if any process has encountered a mismatch in local shapes with its neighbour, leads to errors because of integer limitations. When two or more processes find a mismatch, the huge negative values are added together and I guess because of integer overflow, they disappear into the unknown and finally, we are left with a positive value. So, it does not create any errors when it should. This can be easily solved by using MPI.MIN instead.

One more thing that I don't understand is, why gshape[i] is allowed to be one less than lshape[i] in line 405. I know there must be a reason for it. I just don't understand why.

Code snippet:

import heat as ht
import torch

rank = ht.communication.MPI_WORLD.rank
dim = rank+1

larray = [[0]*dim]*dim
print("Rank", rank, larray)

d = ht.array(larray, is_split=0)

# literally any operation that involves communication 
# between processes will crash
# like this simple print statement
print(d)

Due Diligence

  • [ ] All split configurations tested
  • [ ] Multiple dtypes tested in relevant functions
  • [ ] Documentation updated (if needed)
  • [ ] Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

skip ci

Mystic-Slice avatar Sep 22 '22 17:09 Mystic-Slice

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

ghost avatar Sep 22 '22 17:09 ghost

Changes moved to a different branch. Refer https://github.com/helmholtz-analytics/heat/pull/1034

Mystic-Slice avatar Oct 21 '22 05:10 Mystic-Slice