ompi icon indicating copy to clipboard operation
ompi copied to clipboard

tuned decision for single node communicators

Open bosilca opened this issue 3 years ago • 1 comments

bosilca avatar Jan 27 '22 20:01 bosilca

@bosilca, I did some work related to this PR on the intra node bcast tuning. I extended the algorithm selection process upto a communicator size of 192 (Instance type: hpc7a.96xlarge) and I see some significant differences in the algorithm selection for different combinations of communicator and message sizes. Here's what I see as optimal algorithm selection logic,

if (communicator_size < 4) {
    if (total_dsize < 65536) {
        alg = 1;
    } else {
        alg = 9;
    }
} else if (communicator_size < 8) {
    if (total_dsize < 8192) {
        alg = 5;
    } else if (total_dsize < 131072) {
        alg = 2;
    } else {
        alg = 7;
    }
} else if (communicator_size < 16) {
    if (total_dsize < 131072) {
        alg = 5;
    } else { 
        alg = 8;
    }
} else if (communicator_size < 32) {
    f (total_dsize < 64) {
        alg = 7;
    } else if (total_dsize < 65536) {
        alg = 6;
    } else {
        alg = 8;
    }
} else if (communicator_size < 64) {
    if (total_dsize < 8192) {
        alg = 6;
    } else if (total_dsize < 262144) {
        alg = 5;
    } else { 
        alg = 8;
    }
} else if (communicator_size < 128) {
    if (total_dsize < 8192) {
        alg = 6;
    } else if (total_dsize < 32768) {
        alg = 7;
    } else if (total_dsize < 131072) {
        alg = 5;
    } else {
        alg = 8;
    }
} else if (communicator_size < 192) {
    if (total_dsize < 8192) {
        alg = 6;
    } else if (total_dsize < 32768) {
        alg = 7;
    } else {
        alg = 8;
    }
} else {
    if (total_dsize < 131072) {
        alg = 6;
    } else {
        alg = 8;
    }
}

I will test this decision tree and your decision tree with the bcast collective manually to observe the differences and provide them here.

vidsouza avatar Feb 07 '24 21:02 vidsouza