MPI_COMM_SET_INFO Is Not Collective (But It Should Be)
The MPI Standard says that MPI_COMM_SET_INFO is a collective call, but we don't implement it that way. If hints are set during communicator creation, we can rely on the built-in collectives involved in creating a communicator, but the call stack for MPI_COMM_SET_INFO doesn't include one.
Most likely, the solution here is to add an allreduce so each process knows the value of the other process's hints, given that they could differ and it might be helpful to know them.
My interpretation is the standard simply points out the collective semantics -- that every process needs call the same function. This is irrelevant to whether implementation ensures collective behavior. Since some hints need synchronization, so semantically it has to be a collective call. But if none of the hints in an implementation is actually synchronizing, I don't think it is obligated to do any communications.
For example, MPI_Init is semantically a collective call. But if the process manager pre-setup all the information, then the implementation can just locally init and proceed, right?
@hzhou Your interpretation of the standard is correct. But I think @wesbland's point is that, the implementation should be correct even if some processes give a different value than others, and that might not be true for all hints.
@hzhou Your interpretation of the standard is correct. But I think @wesbland's point is that, the implementation should be correct even if some processes give a different value than others, and that might not be true for all hints.
Yeah, we are aware of that. It is currently left as a FIXME todo item since currently none of the actual hints requires the same values. Once we add such hints, then we'll have to fix it.
From slack conversation, I think what @wesbland is proposing is to add something like MPI_Barrier to ensure collective behavior.
We continued discussing this on Slack, but I want to slurp in that this should probably be an allreduce for a few reasons, most notably because there are some hints that you really want everyone to agree on or you could potentially lose the benefit of implementing them (e.g. mpi_assert_no_any_source).
We continued discussing this on Slack, but I want to slurp in that this should probably be an allreduce for a few reasons, most notably because there are some hints that you really want everyone to agree on or you could potentially lose the benefit of implementing them (e.g.
mpi_assert_no_any_source).
Probably. What "op" should we use?
Probably. What "op" should we use?
There's probably two ways it could be done:
-
Use an allgather to collect every processes's set of info keys to make determinations about how to use the optimizations.
-
Use an allreduce (sum) to total up how many processes set each info key and assume that if the total isn't the same as the size of the communicator, you just won't use the optimization anywhere (instead of trying to keep track of which pairs of processes can do it). For this, you'd have to convert the info strings to a list of known info keys that MPICH will honor and include info keys that the user doesn't set (convert the strings to an integer array).
I think the allgather option probably is the most flexible one. However, I would hate to run one collective per hint key. So I am thinking we can run a single allgather that gathers the entire hints array from all ranks, then go through each key to check for consistency. That means even when the user is only setting a single key, we'll be checking the consistency for all keys regardless. I think practically that will work.
Yep. That's what I meant. I agree that it's more flexible, though I don't know if it makes a practical difference since we probably will only use optimizations when everyone sets them.
What optimizations are you having in mind?
I think this would apply to any optimizations you'd do for any of the assertions currently in the standard (no_any_source, no_any_tag, allow_overtaking). If both sides don't agree to the protocol, you can't add optimizations.
While you could turn on optimizations on a process by process basis, it'd probably be a lot more practical to just do it if all processes turn them on. It also probably makes no difference in practice since I'm sure all applications would set the hint for all processes on the communicator or not bother at all.
I see. What I was envisioning is to enable those runtime optimizations as runtime switch:
if (comm->hints[MPIR_COMM_HINT_NO_ANY_SOURCE]) {
[optimization branch]
}
The idea is the consistency check will be orthogonal to the applying of optimizations. By the time of applying optimizations, we can assume the consistency check is already passed.
That may or may not be possible. For instance, if you did something like allow_overtaking to use relaxed ordering requirements, you'd need both the sender and receiver to agree on whether that's ok. So just using the locally cached value wouldn't be enough (unless you only cache the value as true if everyone provides the same value).
That means the key allow_overtaking has a consistency requirement. So we will check the consistency at MPII_Comm_set_hints time so that hint will never be inconsistent (or error at setting time). This is assuming the "allgather" check will be implemented replacing the current FIXME.
I don't know how allreduce(sum) will help. We should probably add an internal op called is_equal.
I don't know how
allreduce(sum)will help. We should probably add an internal op calledis_equal.
if (sum == comm.size)
is_equal = true;
I don't know how
allreduce(sum)will help. We should probably add an internal op calledis_equal.if (sum == comm.size) is_equal = true;
That'd only work for simple boolean values.
I don't know how
allreduce(sum)will help. We should probably add an internal op calledis_equal.
That sounds like a good idea.