Hui Zhou comments

Results 695 comments of


                                            Hui Zhou

Draft: Session: Do not rely on built-in comms, decouple from world model

> I need help of a CH4 expert to get this PR working for CH4. For ParaStation MPI we had to make some modifications in our PSP device layer so...

Draft: Session: Do not rely on built-in comms, decouple from world model

Interest? Yes. It requires significant work and possibly a redesign for ch4 to work. But I think we can add some conditionals to allow merging these codes if Parastation MPI...

ch4/rma/gpu: bypass yaksa when copying contiguous GPU buffers

test:mpich/ch4/gpu/ofi ✔️ except two cuda memory allocation errors.

ch4/rma/gpu: bypass yaksa when copying contiguous GPU buffers

@abrooks98 Was the gpu/ofi tests clean?

ch4/rma/gpu: bypass yaksa when copying contiguous GPU buffers

Recording sample of the test failures: ![image](https://github.com/pmodels/mpich/assets/1496702/31db28c3-07c2-4517-8fc0-d3f85b93bfc7) I don't think they are related to this PR, so I will merge it and figure out the test failure separately.

hydra: Unable to use `tmpi` with `mpich` on `OSX`

Here is the diagnosis: `tmpi` relaunches hydra using `-pmi-port` option -- someone need explain to me how tmpi works -- so `hydra` will set a `PMI_PORT` string for launched process...

hydra: Unable to use `tmpi` with `mpich` on `OSX`

If we assume `localhost` is universal and always work, we can just use that. This is for MPI processes to connect to `hydra_pmi_proxy`, which should always work in localhost.

hydra: Unable to use `tmpi` with `mpich` on `OSX`

@s-sajid-ali The failure is because hydra listens on the hostname returned from `gethostname`, but on Mac, the hostname is just a name and not necessarily attached to a usable network....

hydra: Unable to use `tmpi` with `mpich` on `OSX`

@s-sajid-ali Could you try another interface, e.g. `en0`, depend on what your ifconfig shows?

hydra: Unable to use `tmpi` with `mpich` on `OSX`

The mpi process cannot connect to the proxy server (`hydra_pmi_proxy`) on either address -- ``` Unable to connect to 127.0.0.1 on 49909 Unable to connect to 10.0.0.37 on 51414 ```...