Wang Zhang

Results 35 comments of Wang Zhang

From my experience with Slurm built with PMIx, user need no 'launcher pod' for each job submitted. This seems a clear benefit to mpi-operator. I've been searching for a minimal...

Thank you so much for your prompt help, Ralph. I watched your [presentation video](https://www.youtube.com/watch?v=9u4xmXpQU_U) and believe there look two scenarios to work with Kubernetes and PMIx: 1. each container is...

let me update the progress so far. But first, sorry for the late update as I was working on the python-sdk-for-mpijob feat. I went through most material mentioned by Ralph...

I believe the rank is assigned after `mpirun` is called. So there is nothing mpi-operator can do to predict which pod will be rank 1. For a generic solution, you...

A simple idea for `discover_hosts.sh` will be opening a server on mpi-operator, exposing the status of the corresponding mpi-job and allowing pods from querying the status from the server. However,...

If you wish to let mpi-operator to watch MPIJobs across multiple namespaces, you can either just omit the launch argument `--namespace` and it will use turn to [default mode](https://github.com/kubeflow/mpi-operator/blob/master/cmd/mpi-operator.v1/app/options/options.go#L57), which...

This issue came to me before occasionally. @heyfey Would you mind provide the log of the mpi controller? If you need further help, please reach me via [email protected]

It seems in the deploy file for v1 specifies the monitored namespace to be `mpi-operator`: https://github.com/kubeflow/mpi-operator/blob/master/deploy/v1/mpi-operator.yaml#L198 While this configuration looks fine, it is not consistent with examples created under default...

Yes, a note will be much better to specify how to configure the deploy of mpi-operator.

/assign @zw0610 I will add this feature into v1 controller.