Jed Brown

Results 531 comments of Jed Brown

I've been using ROCM-3.7 successfully for a few days, but unfortunately, this UCX error is still present with fresh rebuilds of ucx, ompi, and the osu benchmark. ``` $ $OMPI_DIR/bin/mpiexec...

I'm still seeing this issue with ROCm-4.0 and a fresh build of today's ucx (1d22f7486ef4202da30ee811a95ad394b862b9a1) and ompi (8ff2277b7e48b899341f69a9f3f9c9ee7cecf476). ``` $ $OMPI_DIR/bin/mpiexec -n 2 --mca btl '^openib' mpi/pt2pt/osu_bw -d rocm D...

Same issue is also present with today's MPICH 'main' (dac05cf7f9ec1a59e2d917f3da80fc943f378872) ``` $ $MPICH_DIR/bin/mpiexec -n 2 mpi/pt2pt/osu_bw -d rocm D D # OSU MPI-ROCM Bandwidth Test v5.3.2 # Send Buffer on...

Thanks, is the support going upstream? For various reasons, I'm not going to switch distributions, but we're currently on Linux 5.10 and follow the usual upgrades.

There is. It didn't work with mpich when I first tried it (a year or two ago), but I'll try again.

Thanks. In a fresh build of mpich-4.1a1 with this ucx patch, I get SEGV. This is on an AMD laptop with ROCm tools installed, and I guess it's either ucx...

Here's a gdb trace. I'd have to rebuild with better debugging symbols to make it nicer. ``` #0 0x00007ffff52dee3b in uct_base_iface_t_init () from /opt/mpich/lib/libuct.so.0 #1 0x00007ffff4f44a16 in ?? () from...

I'm not following why it still swallows. I added `--enable-g=dbg` and added `-O -g` to `MPICHLIB_CFLAGS`. (`-O0` conflicts with fortify, and thus fails.) ``` (gdb) bt #0 0x00007ffff5316653 in uct_base_iface_t_init...

I'm also interested in this mode of use. I'll note that a best practice for MPI libraries is for object constructors to all take an `MPI_Comm`. Implicitly using `MPI_COMM_WORLD` is...