mpi-operator icon indicating copy to clipboard operation
mpi-operator copied to clipboard

slotPerWorker attribute and number of worker per node

Open kpouget opened this issue 5 years ago • 4 comments

Hello, I am confused by the meaning of the slotPerWorker attribute:

type MPIJobSpec struct {

	// Specifies the number of slots per worker used in hostfile.
	// Defaults to 1.
	// +optional
	SlotsPerWorker *int32 `json:"slotsPerWorker,omitempty"`
        ...
}

Does worker refers to worker pods or worker nodes?

I would like to deploy 1 MPI job per worker node, but it seems to to be what the MPI operator does: multiple worker pods on one worker node (MPI worker pods ...-worker-0 and ...-worker-1 are both on the node worker05)

Is this a bug, or is there a way to deploy one pod per node?

kpouget avatar Sep 21 '20 12:09 kpouget

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/question 0.76

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Sep 21 '20 12:09 issue-label-bot[bot]

Hello @kpouget ,

The slotsPerWorker comes from MPI's slots, you may ref it at here.

The Worker means worker pod, you can setup it in yaml file. You may use pod non-affinity feature to schedule them.

carmark avatar Sep 24 '20 08:09 carmark

Hello @carmark,

The Worker means worker pod, you can setup it in yaml file.

I'm not sure to agree/observe what you're saying:

mpi slots and pods

I request 4 worker replicas and 2 slots per worker, but I get 4 worker pods

edit: there is a mix of solver/mesher in the screenshot but both run with the same settings

kpouget avatar Sep 25 '20 12:09 kpouget

@kpouget The replicas number is same as worker number. It will get 4 worker pods if you request 4 worker replicas.

The slots will not impact the the number of worker pods, it will only be setup in mpi hostfile(/etc/mpi/hostfile).

carmark avatar Sep 27 '20 01:09 carmark

It seems that this was resolved. If you have any questions, feel free to open new issues. /close

tenzen-y avatar Nov 15 '23 18:11 tenzen-y

@tenzen-y: Closing this issue.

In response to this:

It seems that this was resolved. If you have any questions, feel free to open new issues. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Nov 15 '23 18:11 google-oss-prow[bot]