dash
dash copied to clipboard
DART calls influencing each other [MPICH 3.2]
I have some problems with DART calls. Here is a minimum example that results in the last DART call to be non-blocking on Unit 1, but blocking on Unit 0 which makes the program to never terminate:
#include <libdash.h>
int main(int argc, char* argv[]) {
dash::init(&argc, &argv);
std::vector<int> send1;
std::vector<std::size_t> send1_count;
std::vector<std::size_t> send1_displs;
std::vector<int> recv1;
std::vector<std::size_t> recv1_count;
std::vector<std::size_t> recv1_displs;
if(dash::myid() == 0) {
send1.resize(5);
send1_count = { 0, 20 };
send1_displs = { 0, 0 };
recv1 = { };
recv1_count = { 0, 0 };
recv1_displs = { 0, 0 };
} else {
send1 = { };
send1_count = { 0, 0 };
send1_displs = { 0, 0 };
recv1.resize(5);
recv1_count = { 20, 0 };
recv1_displs = { 0, 20 };
}
dart_alltoallv(send1.data(), send1_count.data(), send1_displs.data(),
DART_TYPE_BYTE, recv1.data(), recv1_count.data(), recv1_displs.data(),
dash::Team::All().dart_id());
int send2 = 5;
std::vector<std::size_t> recv2(2);
dart_allgather(&send2, recv2.data(), sizeof(std::size_t), DART_TYPE_BYTE,
dash::Team::All().dart_id());
std::vector<int> send3;
std::vector<std::size_t> send3_count;
std::vector<std::size_t> send3_displs;
std::vector<int> recv3;
std::vector<std::size_t> recv3_count;
std::vector<std::size_t> recv3_displs;
if(dash::myid() == 0) {
send3 = { 2, 2, 4, 3, 4, 1, 2, 0 };
send3_count = { 8, 0 };
send3_displs = { 0, 8 };
recv3.resize(9);
recv3_count = { 8, 1 };
recv3_displs = { 0, 8 };
} else {
send3 = { 6, 6, 7, 9, 5, 9, 6, 8 };
send3_count = { 1, 7 };
send3_displs = { 0, 1 };
recv3.resize(7);
recv3_count = { 0, 7 };
recv3_displs = { 0, 0 };
}
dart_alltoallv(send3.data(), send3_count.data(), send3_displs.data(),
DART_TYPE_INT, recv3.data(), recv3_count.data(), recv3_displs.data(),
dash::Team::All().dart_id());
dash::finalize();
return 0;
}
Each of the calls run perfectly fine on their own, but in combination I am getting this weird behaviour. Commenting out the second DART call also results in the following error for the third DART call:
Message from rank 1 to tag 10 truncated; 20 bytes received but buffer size is 4
This seems clearly wrong, as unit 1 should only be sending 4 bytes according to the code.
I am no MPI expert, so maybe my code is buggy but it seems odd that each of the calls run successfully on their own and only the combination creates these problems.
MPI version: MPICH 3.2
Mhh, that code looks fishy in multiple places (for example: why do send1_count
and send3_count
have different values on both processes?). Have you tried doing the same thing in MPI without DART in between? (the collectives in DART are mostly just forwarding things to the MPI collectives)
I suggest you look at the MPI standard and if you have a correct MPI example that fails if ported to DART I'm happy to help :)
I have actually just created a minimum example from the values my real code generates. send1_count and send3_count have different values because the processes have to exchange different amoutns of data (hence the alltoall*v*). I will try doing the same thing in MPI, but if I get the same results, can I assume my MPI version is broken?
I apologize, it was late last night. You're right, for ~allgatherv~ alltoallv
it makes sense. However, I cannot find the implementation of dart_alltoallv
. Is that implemented in a private branch?
Also, are you running on a 32-bit or 64-bit machine? In case of the latter, the allgatherv is illegal because sizeof(int) != sizeof(std::size_t)
. This might not be the reason for why the processes hang, though.
I just checked the commit log and it seems I have implemented both dart_alltoall
and dart_alltoallv
myself on my branch. That is a bit embarrassing as I don't even remember it at all...
Anyway, I just had a quick look at the code and found a possible memory leak in dart_alltoallv
, but i don't think it is related to my problem at all. Couldn't find any other problems right now, but will check it more in-depth later.
Do you think you could look at it too? It's located in the feat-graph
branch.
As for the alltoallv
being illegal: The processes exchange int
s and only the counts and displacements are std::size_t
. dart_alltoallv
converts those size_t arrays to int arrays internally, so that should not be a problem. I think when I created the code, I took dart_allgatherv
as an example.
I meant this line:
int send2 = 5;
std::vector<std::size_t> recv2(2);
dart_allgather(&send2, recv2.data(), sizeof(std::size_t), DART_TYPE_BYTE,
dash::Team::All().dart_id());
send2
is an int (4 Byte) but you are treating it as size_t (potentially 8 Byte).
Anyway, I looked at the implementation of dart_alltoallv
and couldn't find anything immediately suspicious. I get an error about truncated messages from Open MPI 3.0.0, even without the allgather in between. I converted your example to MPI and the error does not manifest there. It might be worth checking the values that come out to see whether the alltoallv behave as expected.
Ah, sorry, yes. That was a mistake. The original code handles this correctly though.
The values the first dart_alltoallv
call returns should be correct. I just added some random data to the examples to make sure and they are indeed correct.
if (sendbuf == recvbuf || NULL == sendbuf) {
sendbuf = MPI_IN_PLACE;
}
Seems like these lines are causing the problem. However, I don't get how this affects the second dart_alltoallv
call...
because the first dart_alltoallv
call is in_place on unit 1, but not on unit 0, the call is blocking on unit 0 and then couples with the second dart_alltoallv
call on unit 1.
What is the reason for the NULL == sendbuf
? Can i safely remove it?
MPI 3.1 standard, p. 169:
The “in place” option for intracommunicators is specified by passing MPI_IN_PLACE to the argument sendbuf at all processes.
This is an optimization that is sensible and we should keep that option. Unfortunately, the developer who added this neat feature forgot to document it. Patches are welcome ;) (please open a pull request if you want to add the documentation) A debug message that in-place is being used might be helpful as well.
In your case, I suspect that the empty std::vector
returns NULL
in the call to data()
, which then triggers the use of MPI_IN_PLACE
, just not on all processes. Things go south from there for MPI... Good catch!
Thanks! My understanding is that in_place is used when sendbuf == recvbuf
, so I think it should be safe to just remove the NULL == sendbuf
part. This should probably be done for dart_allgatherv
in the main branch as well.
Please check whether anyone actually calls collectives with NULL == sendbuf
before you remove this feature. I'm fine with changing the convention to sendbuf == recvbuf
triggering the use of MPI_IN_PLACE
. It just has to be documented :)
I am actually not sure how to do that because if it is used depends on the data that is passed to the functions. And that is kind of dependent on the user.
For *v* collectives, in_place only makes sense, if both sendbuf AND recvbuf are NULL and that is already covered with the sendbuf== recvbuf
condition
For v collectives, in_place only makes sense, if both sendbuf AND recvbuf are NULL and that is already covered with the sendbuf== recvbuf condition
Again, the MPI standard states:
The “in place” option for intracommunicators is specified by passing MPI_IN_PLACE to the argument sendbuf at all processes. In such a case, sendcount and sendtype are ignored. The data to be sent is taken from the recvbuf and replaced by the received data. Data sent and received must have the same type map as specified by recvcount and recvtype.
So, even for v
variants it makes sense to use it. I think if the user provides sendbuf == recvbuf
it is clear what his intention is: send data from sendbuf
and replace it by the received data. It just has to be made clear in the DART interface that if the user wants to trigger this behavior he has to do it on all units. Passing the same buffer for sendbuf
and recvbuf
without replacing sendbuf
with MPI_IN_PLACE
will likely lead to wrong results.
Yes, in_place also makes sense here. Just not the sendbuf==NULL
part which is why I don't think that checking whether this is used somewhere is actually needed.
How would you go about making things clear in the interface? Isn't it enough to state it in the doc comments of the function?
Yes, in_place also makes sense here. Just not the
sendbuf==NULL
part which is why I don't think that checking whether this is used somewhere is actually needed.
You're right. And passing sendbuf==NULL
can be perfectly reasonable if there is nothing to send.
How would you go about making things clear in the interface? Isn't it enough to state it in the doc comments of the function?
Yes, it should be added to the documentation. Something along the lines of
The user may pass the same buffer for \c sendbuf and \c recvbuf on all units, in which case data will be sent from and received into \c recvbuf. The contents of \c sendcnt and \c senddispls will be ignored in that case.
That's for dart_alltoallv
, it should be adapted for the other relevant collectives.