ompi icon indicating copy to clipboard operation
ompi copied to clipboard

dead coll tuned alltoall mca parameters

Open burlen opened this issue 1 year ago • 8 comments

Background information

I'm tuning OpenMPI's alltoall on our cluster.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Compiled from sources, on tag v5.0.3

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

33e93469e1e1f69904ff3e3827394719aa6b3671 3rd-party/openpmix (v5.0.2) 3a70fac9a21700b31c4a9f9958afa207a627f0fa 3rd-party/prrte (v3.0.5) dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (heads/main)

Please describe the system on which you are running

  • Operating system/version: ubuntu 22.04 lts
  • Computer hardware: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
  • Network type: infiniband

Details of the problem

The following mca parameters relating to alltoall in coll tuned component are reported by ompi_info but are not used in the code.

coll_tuned_alltoall_small_msg
coll_tuned_alltoall_intermediate_msg
coll_tuned_alltoall_algorithm_segmentsize
coll_tuned_alltoall_algorithm_tree_fanout
coll_tuned_alltoall_algorithm_chain_fanout
coll_tuned_alltoall_large_msg                                                                                                        
coll_tuned_alltoall_min_procs

These may be remnants of a previous implementation, but currently setting these does nothing. Their presence complicates tuning. They should either be removed or at least the description string reported by ompi_info modified so that it's clear that these do nothing.

burlen avatar May 15 '24 20:05 burlen

Can't remove in v5.0.x but can add a message, and those that are not used, to be removed in main There is a PR coming in where a parameter or two is used.

janjust avatar May 16 '24 16:05 janjust

https://github.com/open-mpi/ompi/pull/12453/files Will use the fanout parameter(there are 2 fanouts).

wenduwan avatar May 16 '24 16:05 wenduwan

@wenduwan would it make sense to use a different mca parameter name to avoid confusion with the original use of the mca fan out parameters? while these features are not very well documented, I assume it is documented somewhere and could thus be confusing with your new use. Assuming the two implementations are different

burlen avatar May 28 '24 15:05 burlen

@burlen I'm actually fuzzy on the difference between fanout vs radix. Do you mind providing clarity on the terminology?

wenduwan avatar May 28 '24 17:05 wenduwan

@burlen I'm actually fuzzy on the difference between fanout vs radix. Do you mind providing clarity on the terminology?

@wenduwan The Bruck algorithm completes in logarithmic number of rounds. Radix refers to the base of the logarithm. This is the terminology the author of the PR you linked uses for function argument names, and is used in the paper I linked in the review. It is really not about radix vs fanout. The problem is fanout is already in use for a different purpose, which makes overloading it as done in the PR potentially confusing.

burlen avatar May 30 '24 01:05 burlen

@burlen I see.

From an engineering perspective, we also need to consider code maintainability. This issue arises from the introduction of additional parameters, which I believe had good reasons at some point - similar to this discussion.

For fan-out vs radix, do you see a case where both can be used together? Is it sufficient to simply document what the parameter means in different algorithms rather than creating an entirely new parameter?

wenduwan avatar May 30 '24 14:05 wenduwan

For fan-out vs radix, do you see a case where both can be used together?

Since faninout is not even currently used in any of the alltoall implementations, I don't see how it could be

Is it sufficient to simply document what the parameter means in different algorithms rather than creating an entirely new parameter?

That would certainly be helpful. In my opinion it would be better to have the parameter name directly relate to the algorithm. for both maintainability and usability. the name itself can be part of the documentation (helps both users and maintainers). Such naming could even reduce the burden of writing documentation because you could cite the paper for more information

It's kind of a minor thing, I guess I'm a bit hung up on it (apologies!), because I'm finding these tuning features to be somewhat poorly documented and as a result rather difficult to use.

burlen avatar May 31 '24 16:05 burlen

@burlen Thanks for the followup. In this case I suggest we postpone the decision to introduce a new parameter, and enrich the documentation for now. It is easier to add features than deprecate them afterwards.

wenduwan avatar Jun 06 '24 14:06 wenduwan