bosilca
bosilca
Everything seems to be in place, but the CUDA accelerator component is not loaded. `ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so` or `readelf -d /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so` please.
Everything looks normal. Let's make sure launching an app does not screw up the environment `mpirun -np 1 ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so`
that might be a reason, not PATH but LD_LIBRARY_PATH. Most of the components are build statically in the libmpi.so with a few exceptions, and CUDA-based components are part of these...
There seems to already be an `OPAL_THREAD_UNLOCK` in `mca_base_source_init` before calling `mca_base_source_register` so the mutex should already be unlocked. There is a valid reason why pvars do not need to...
I don't see a lock in `mca_base_source_get_by_name` but that might be due to the last push. Anyway, there is an unlock left in `mca_base_source_get_by_name` and that's definitively no good. Moreover,...
These results looks suspicious. OSU doesn't do overlap, it basically posts the non-blocking and wait for it. For small messages I could understand 10 to 20 percent performance degradation, but...
I saw your question about the smcuda BTL and it's peculiar support for threads (aka. having its own thread waiting on a fifo to be signalled by peers when messages...
> I am definitely in favor of asynchronous progress, maybe I misunderstood your PR ; putting explicit APIs as was in the MPICH APIs in is different from making asynchronous...
`MPI_Comm_split_type` creates disjoint communicators, which means the sum of the sizes of the sub-communicators should be 8 (the size of your `MPI_COMM_WORLD`). Please provide the output of `hwloc-ls` to see...
Funny, that's kind of what I suspected based on the published documentation of the Genoa architecture but with the topo things would have been more clear. OMPI current `split by...