heat icon indicating copy to clipboard operation
heat copied to clipboard

Datatype tiling for large communication

Open Markus-Goetz opened this issue 5 years ago • 4 comments

Description Heat allows to use various wrapped MPI calls to transmit data between processor (e.g. replit()). If the buffer of such a transmission is too large, i.e. exceeding the int32 value range, MPI will quit with an error.

Could perhaps be fixed using the tiling implementation of the QR branch.

To Reproduce Steps to reproduce the behavior:

  1. Which module/class/function is affected? communications.py
  2. What are the circumstances under which the bug appears? Several, e.g.:
a = ht.zeros(((INT32_MAX + 1) * processors, processors), split=0).resplit(1)
  1. What is the exact error-message/errorous behavious? Depends on MPI implementation

Expected behavior No MPI error

Version Info any

Markus-Goetz avatar Jan 20 '20 12:01 Markus-Goetz

small update on this one. the #520 PR has a new tiling class. theoretically, this could be modified to cope with this by only sending partial tiles. although it may require a fair bit of changes.

coquelin77 avatar Apr 02 '20 09:04 coquelin77

In principle relevant, although not of highest priority because this problem can be solved by increasing the number of processes usually. (Reviewed within #1109 )

My question @Markus-Goetz: should this issue address the wrappers for the MPI-operations (i.e. heat.comm.Send() performs several mpi4py.MPI.comm.Send() if the data to send is too large) or shall we rather adapt the usage of heat.comm.Send() in those algorithms where potentially large data are sent? -- The first idea sounds more elegant, however, w.r.t. #383 the second option may allow better refactoring of algorithms including overlap of communication and computation.

mrfh92 avatar Aug 17 '23 18:08 mrfh92