G-Ragghianti

Results 31 comments of G-Ragghianti

Yes, I noticed that TensileLibrary.dat wasn't provided since version 5.2, but the code in rocblas seems to still reference it as a fallback if there is a failure to read...

Yes, thanks for the clarification. Kadir also had produced some output from `strace` which showed the files it was attempting to open. Maybe this would also be useful for you?...

It looks like this doesn't work with [email protected]: https://github.com/spack/spack/issues/43021 This just happened to be the version that was selected in my most recent attempt to install papi+rocm

Thanks for looking into this. A whitelist of network interfaces is less ideal than a blacklist since we have machines in our cluster with various names of interfaces. However, testing...

Yes, the error comes from OMPI in that case. I've been trying different combinations between using OMPI_MCA_btl_tcp_if_exclude, OMPI_MCA_btl_tcp_if_include, and UCX_NET_DEVICES with mainly confusion resulting. The only combination that works without...

But when I use just UCX_NET_DEVICES=ibp193s0f0 I get the following error from OMPI? ``` $ UCX_NET_DEVICES=ibp193s0f0 srun -n2 -w histamine0,histamine1 osu_bcast # OSU MPI Broadcast Latency Test v7.2 # Datatype:...

Thanks @abouteiller but the two machines that I'm testing this on have ipoib configure (and verified) and UCX is still trying to use the docker0 interface if I don't give...

OK, it does work differently when using mpirun/mpiexec as the launcher instead of srun.

So far, it looks like using UCX_NET_DEVICES=mlx5_0:1 is a successful workaround for the case where infiniband is connected, but the problem still remains in the case where we want machines...