Results 6 comments of Amirreza Rastegari

@yosefe for review and consideration. Many thanks in advance

Hi @shasson5 Thank you for looking into this, we highly appreciate it. We see a performance degradation (i.e. low scaling efficiency) when running mpi jobs each node of the system...

Hi @shasson5 Thank you so much for looking into this issue. This is a quad socket system with each socket having 96 cores (x86 cores). Each socket has 1 IB...

Hi @shasson5 Thank you very much for your help. Here is the numa config ``` numactl -H available: 16 nodes (0-15) node 0 cpus: 0 1 2 3 4 5...

Hi @arun-chandran-edarath Thank you very much for the suggestion. Our workloads mostly fully populate the entire node (in this case 384 ranks per node) so this didn't help us (i.e....

Hi @shasson5 Thanks for looking into this issue. I think the default behavior for UCX_MAX_RNDV_RAILS should be to either use multiple virtual lanes within the same adapter or multiple physical...