heat icon indicating copy to clipboard operation
heat copied to clipboard

Implement tensor redistribution for binary operations

Open Markus-Goetz opened this issue 5 years ago • 12 comments

After PR #165 is merged, implement binary operators for two tensor with set, but different split axis (e.g. split=0 and split=1).

Think of a rule to decide which tensor should be resplit. Proposal (might require modification):

  1. attempt to redistribute tensors with fewer elements
  2. should both have the same amount of elements, redistribute tensor that are split along a minor axis, i.e. split >= 1.
  3. should both have the same number of elements and both have a minor axis, redistribute the one with larger minor axis

EDITED July 31st 2023 by @ClaudiaComito Here's the current status of binary operations heat.core._operations.__binary_op():

  • [x] ensure that operand types are promoted correctly
  • [x] ensure that binary ops are possible between operands split along the same axis but with misaligned lshape_maps (e.g. as a result of indexing)
  • [ ] ensure that operands are on the same MPI communicator
  • [ ] ensure that binary ops are possible between operands split along different axes

Reviewed within #1109

Markus-Goetz avatar Mar 12 '19 20:03 Markus-Goetz

@Markus-Goetz I will try out these rules. However, what happens in 2) if none of the tensors is split among a minor axis?

Cdebus avatar Apr 10 '19 06:04 Cdebus

In this case you may simply add them locally as already implemented and working

Markus-Goetz avatar Apr 10 '19 07:04 Markus-Goetz

No, I mean if they both have a split > 1, but along different axes... or is that unrealistic?

Cdebus avatar Apr 10 '19 07:04 Cdebus

That would be rule 3.) if both have a minor axis, than the one with the larger minor axis will be redistributed

Markus-Goetz avatar Apr 10 '19 09:04 Markus-Goetz

Couple more things that came to my mind that have to be considered:

  • __binary_op should ensure that the comms of the two operands match. We cannot perform the operation if the data is distributed across different comms. Exception: the split of one of the operands is None. Local data can always be used

  • Ensure that the operands types are properly promoted. See also types.promote_types.

Markus-Goetz avatar Apr 16 '19 08:04 Markus-Goetz

Why was that closed? Issue is not fixed, PR #268 bumped due to issues with MPI communication and resplit

Cdebus avatar Jul 25 '19 11:07 Cdebus

My apologies. i think I saw that the PR was closed (at the time it was a month after the bump, and i think that you had closed the PR). I am not sure but I think that I had thought that the PR was merged and then it was a stay issue which was not closed at the time of merging. Sorry @Cdebus

coquelin77 avatar Jul 25 '19 11:07 coquelin77

No worries ^^ I was just surprised, because fixing the _binary_op was my original intention but didn't work due to resplit due to allgather, and that's how I ended up with that mess

Cdebus avatar Jul 25 '19 11:07 Cdebus

Status? @coquelin77 @Markus-Goetz can we break this into smaller issues?

ClaudiaComito avatar Apr 01 '22 08:04 ClaudiaComito

Alright, here's the current status of binary operations heat.core._operations.__binary_op():

  • [x] ensure that operand types are promoted correctly
  • [x] ensure that binary ops are possible between operands split along the same axis but with misaligned lshape_maps (e.g. as a result of indexing)
  • [ ] ensure that operands are on the same MPI communicator
  • [ ] ensure that binary ops are possible between operands split along different axes

Reviewed within #1109

ClaudiaComito avatar Jul 31 '23 16:07 ClaudiaComito

Stil active and ongoing work.

(Reviewed within #1109 )

mrfh92 avatar Aug 21 '23 10:08 mrfh92