bosilca comments

Results 318 comments of


                                            bosilca

CUDA-aware MPI build on fresh Ubuntu 24.04 LTS, `MPIX_Query_cuda_support()` returns zero

Everything seems to be in place, but the CUDA accelerator component is not loaded. `ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so` or `readelf -d /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so` please.

CUDA-aware MPI build on fresh Ubuntu 24.04 LTS, `MPIX_Query_cuda_support()` returns zero

Everything looks normal. Let's make sure launching an app does not screw up the environment `mpirun -np 1 ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so`

CUDA-aware MPI build on fresh Ubuntu 24.04 LTS, `MPIX_Query_cuda_support()` returns zero

that might be a reason, not PATH but LD_LIBRARY_PATH. Most of the components are build statically in the libmpi.so with a few exceptions, and CUDA-based components are part of these...

MPI_T Events

There seems to already be an `OPAL_THREAD_UNLOCK` in `mca_base_source_init` before calling `mca_base_source_register` so the mutex should already be unlocked. There is a valid reason why pvars do not need to...

MPI_T Events

I don't see a lock in `mca_base_source_get_by_name` but that might be due to the last push. Anyway, there is an unlock left in `mca_base_source_get_by_name` and that's definitively no good. Moreover,...

RFC: Provide equivalence of MPICH_ASYNC_PROGRESS

These results looks suspicious. OSU doesn't do overlap, it basically posts the non-blocking and wait for it. For small messages I could understand 10 to 20 percent performance degradation, but...

RFC: Provide equivalence of MPICH_ASYNC_PROGRESS

I saw your question about the smcuda BTL and it's peculiar support for threads (aka. having its own thread waiting on a fifo to be signalled by peers when messages...

RFC: Provide equivalence of MPICH_ASYNC_PROGRESS

> I am definitely in favor of asynchronous progress, maybe I misunderstood your PR ; putting explicit APIs as was in the MPICH APIs in is different from making asynchronous...

MPI_Comm_split_type not creating disjoint subgroups for certain cases

`MPI_Comm_split_type` creates disjoint communicators, which means the sum of the sizes of the sub-communicators should be 8 (the size of your `MPI_COMM_WORLD`). Please provide the output of `hwloc-ls` to see...

MPI_Comm_split_type not creating disjoint subgroups for certain cases

Funny, that's kind of what I suspected based on the published documentation of the Genoa architecture but with the topo things would have been more clear. OMPI current `split by...