Johannes Blaschke

Results 113 comments of Johannes Blaschke

@JamesWrigley @jonas-schulze I'll be working on this this week. Just getting back up to speed after long travel...

@jsquyres thanks for getting back to me > 1. What interface / IP address do you want Open MPI to use? I am not familiar with the logic that OpenMPI's...

> Your description of the problem lists 6 private IP addresses and 1 public IP address Oh one more thing: of the interfaces listed in my example: OpenMPI definitely should...

> Later in the text, you specifically mention that there are _4_ NICs and _4_ GPUs; I infer that this means that there are 4 NUMA domains, too. Oops, my...

> I think you should check into how the Linux kernel handles IP traffic from multiple interfaces that are all on the same subnet. It's been a little while since...

>Assuming you're using NVIDIA NICs and GPUs, shouldn't you be using UCX? UCX will handle all the interface selection and pinning, etc. It will also handle all the RDMA and...

> Also, this begs the larger question: why are you using TCP? Ah! Fair point. We are straying very far from the original point of this issue though. But for...

> However, I believe OMPI does support Slingshot operations - in fact, last I checked OMPI v5 is running on Perlmutter. Correct! But that requires OpenMPI to be built with...

> Correct. However, you have to use PMIx with OMPI v5 (no pmi2 support any more). Good point, another reason to bump the container's OMPI version up to v5 >...