ompi
ompi copied to clipboard
tuned decision for single node communicators
@bosilca, I did some work related to this PR on the intra node bcast tuning. I extended the algorithm selection process upto a communicator size of 192 (Instance type: hpc7a.96xlarge) and I see some significant differences in the algorithm selection for different combinations of communicator and message sizes. Here's what I see as optimal algorithm selection logic,
if (communicator_size < 4) {
if (total_dsize < 65536) {
alg = 1;
} else {
alg = 9;
}
} else if (communicator_size < 8) {
if (total_dsize < 8192) {
alg = 5;
} else if (total_dsize < 131072) {
alg = 2;
} else {
alg = 7;
}
} else if (communicator_size < 16) {
if (total_dsize < 131072) {
alg = 5;
} else {
alg = 8;
}
} else if (communicator_size < 32) {
f (total_dsize < 64) {
alg = 7;
} else if (total_dsize < 65536) {
alg = 6;
} else {
alg = 8;
}
} else if (communicator_size < 64) {
if (total_dsize < 8192) {
alg = 6;
} else if (total_dsize < 262144) {
alg = 5;
} else {
alg = 8;
}
} else if (communicator_size < 128) {
if (total_dsize < 8192) {
alg = 6;
} else if (total_dsize < 32768) {
alg = 7;
} else if (total_dsize < 131072) {
alg = 5;
} else {
alg = 8;
}
} else if (communicator_size < 192) {
if (total_dsize < 8192) {
alg = 6;
} else if (total_dsize < 32768) {
alg = 7;
} else {
alg = 8;
}
} else {
if (total_dsize < 131072) {
alg = 6;
} else {
alg = 8;
}
}
I will test this decision tree and your decision tree with the bcast collective manually to observe the differences and provide them here.