ompi
ompi copied to clipboard
Coll/han Improvements on algorithm gestion through MCA and configuration file
Allow topological level to be named in configuration file
Try to read toplogical level as a string then as an id in configuration file.
Improve algorithm management and choice
Uniformisation of algorithm choice mechanism. Translation table from name to function pointer is set in ompi/mca/coll/han/coll_han_algos.c as mca_base_var_enum_value_t.
Allow algorithm selection (optional) in configuration file
Algorithm choice can be made directly in the configuration file for han component (see configuration file example).
Algorithm choice through MCA parameters simplification
Algorithm choice is made using their name through an enum.
Configuration file example
1 # Number of collectives described in this file
allreduce # Set of rules for allreduce collectives
1 # How many topological levels are described in this file
global_communicator # Topological level
1 # Number of configurations
1 # Configuration size (communicator size on this level)
4 # Number of message size rules
0 han @intra # From 0 to 999 sized message, use intra algorithm of han component
1000 han # From 1000 to 7999, use default algorithm of han component
8000 han @simple # From 8000 to 19999, use simple algorithm of han component
20000 tuned # Fallback on tuned if message size is higher than 20000
Note: Han can only be used on the global_communicator level.
Set of MCA parameters to read a han configuration file:
# Han must be selected to be used
export OMPI_MCA_coll_han_priority=100
# Activate file reading
export OMPI_MCA_coll_han_use_dynamic_file_rules=true
# Set file path
export OMPI_MCA_coll_han_dynamic_rules_filename=path/to/configuration_file
Can one of the admins verify this patch?
ok to test
ok to test
bot:ibm:retest
@FlorentGermain-Bull Would you be able to rebase your branch on main
somewhere after
7dbfbeea - build: Use open-mpi/oac for oac submodule
commit? We're having an issue with the IBM CI when it tries to test a Pull Request that doesn't include that commit.
@FlorentGermain-Bull And be sure to see https://www.mail-archive.com/[email protected]/msg21421.html
bot:ibm:retest
FYI it looks like all changes proposed in #10456 are also included here
It worked! Thanks. I've heard that Mellanox is working on Their CI. So no action on your part for that.
bot:aws:retest
bot:aws:retest
@FlorentGermain-Bull can you rebase this on top of current main if it is still something you want to get in. Thanks
@bosilca please review so we can get this into v5.
@FlorentGermain-Bull Are you planning to bring this back to 5.0.x?
@FlorentGermain-Bull Are you planning to bring this back to 5.0.x?
sorry for the late reply I'm on it