João Batista Fernandes
João Batista Fernandes
I was running the same code with `--enable-debug` (on openmpi installation) and it was generating an infinite loop with this output: `mpirun -n 2 -mca osc rdma -mca btl_openib_allow_ib 1...
@devreal I mentioned there, but not here. The PR #10413 solve my problems. For now, I'm using a local code that I modified waiting for the changing incorporate into some...
I saw the problem in the Issue in https://github.com/open-mpi/ompi/issues/9580 . Initially, my problem was another, but from this point https://github.com/open-mpi/ompi/issues/9580#issuecomment-954090542, I noticed that my issue with osc/ucx. If it is...
@devreal Thanks for the fast answer. I think not. I did not have output with the command `ompi_info | grep ucx`.
In both cases, there was nothing different on output. Follow the commands and outputs. `salloc -N2 --hint=compute_bound --exclusive mpirun --mca osc rdma --mca osc_verbose 100 test.o` > salloc: Granted job...
Now we had a output, here: `salloc --nodes=2 --hint=compute_bound --exclusive mpirun --mca osc_base_verbose 100 test.o` > salloc: Granted job allocation 1915069 > [r1i2n9:03710] mca: base: components_register: registering framework osc components...
I've made a local installation of OpenMPI using **ucx**. `../configure --prefix=/home/jbfernandes/.local/ --with-ucx=/home/jbfernandes/.local/ --disable-man-pages` The error when I force the RDMA disappeared. However, I do not have the correct behavior of...
I think I have to explain better my issue. I'm working in a heterogeneous environment, where each node has a different processing speed. To simulate this, I wrote the line...
> I believe this is what you expect, right? Yes, it is exactly this. I have this result only when I use shared memory. > In your initial post, you...
@yosefe I already tried to run `salloc --exclusive --hint=compute_bound -N2 mpirun -np 2 --mca pml ucx -x UCX_NET_DEVICES=mlx4_0:1 ./test.o ` but the wrong behavior do not change.