heat
heat copied to clipboard
Implement tensor redistribution for binary operations
After PR #165 is merged, implement binary operators for two tensor with set, but different split axis (e.g. split=0 and split=1).
Think of a rule to decide which tensor should be resplit. Proposal (might require modification):
- attempt to redistribute tensors with fewer elements
- should both have the same amount of elements, redistribute tensor that are split along a minor axis, i.e. split >= 1.
- should both have the same number of elements and both have a minor axis, redistribute the one with larger minor axis
EDITED July 31st 2023 by @ClaudiaComito
Here's the current status of binary operations heat.core._operations.__binary_op()
:
- [x] ensure that operand types are promoted correctly
- [x] ensure that binary ops are possible between operands split along the same axis but with misaligned
lshape_maps
(e.g. as a result of indexing) - [ ] ensure that operands are on the same MPI communicator
- [ ] ensure that binary ops are possible between operands split along different axes
Reviewed within #1109
@Markus-Goetz I will try out these rules. However, what happens in 2) if none of the tensors is split among a minor axis?
In this case you may simply add them locally as already implemented and working
No, I mean if they both have a split > 1, but along different axes... or is that unrealistic?
That would be rule 3.) if both have a minor axis, than the one with the larger minor axis will be redistributed
Couple more things that came to my mind that have to be considered:
-
__binary_op should ensure that the comms of the two operands match. We cannot perform the operation if the data is distributed across different comms. Exception: the split of one of the operands is
None
. Local data can always be used -
Ensure that the operands types are properly promoted. See also
types.promote_types
.
Why was that closed? Issue is not fixed, PR #268 bumped due to issues with MPI communication and resplit
My apologies. i think I saw that the PR was closed (at the time it was a month after the bump, and i think that you had closed the PR). I am not sure but I think that I had thought that the PR was merged and then it was a stay issue which was not closed at the time of merging. Sorry @Cdebus
No worries ^^ I was just surprised, because fixing the _binary_op was my original intention but didn't work due to resplit due to allgather, and that's how I ended up with that mess
Status? @coquelin77 @Markus-Goetz can we break this into smaller issues?
Alright, here's the current status of binary operations heat.core._operations.__binary_op()
:
- [x] ensure that operand types are promoted correctly
- [x] ensure that binary ops are possible between operands split along the same axis but with misaligned
lshape_maps
(e.g. as a result of indexing) - [ ] ensure that operands are on the same MPI communicator
- [ ] ensure that binary ops are possible between operands split along different axes
Reviewed within #1109
Branch 167-Implement_tensor_redistribution_for_binary_operations created!
Stil active and ongoing work.
(Reviewed within #1109 )