mpi-operator
mpi-operator copied to clipboard
MPI example not working with volcano scheduler
This example (https://github.com/kubeflow/mpi-operator/blob/master/examples/v1alpha2/tensorflow-benchmarks.yaml) will pending forever when using volcano scheduler.
Seems related to this issue: https://github.com/volcano-sh/volcano/issues/461
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
area/operator | 0.51 |
kind/bug | 0.89 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
@yuyue9284 Which Volcano version are you using? And could you post your mpi-operator deployment?
Now, volcano v1.0.0 did change the PodGroup CRD APIGroup to volcano.sh
, so you may need use mpijob with v1 version which support that PodGroup version. And you need to set --gang-scheduling
defined at here.
@yuyue9284 Which Volcano version are you using? And could you post your mpi-operator deployment?
Now, volcano v1.0.0 did change the PodGroup CRD APIGroup to
volcano.sh
, so you may need use mpijob with v1 version which support that PodGroup version. And you need to set--gang-scheduling
defined at here.
Hi @carmark , this is my deployment for mpi-operator,
apiVersion: apps/v1
kind: Deployment
metadata:
name: mpi-operator
namespace: mpi-operator
labels:
app: mpi-operator
spec:
replicas: 1
selector:
matchLabels:
app: mpi-operator
template:
metadata:
labels:
app: mpi-operator
spec:
serviceAccountName: mpi-operator
containers:
- name: mpi-operator
image: mpioperator/mpi-operator:latest
args: [
"-alsologtostderr",
"--kubectl-delivery-image",
"mpioperator/kubectl-delivery:latest",
"-gang-scheduling",
"volcano"
]
imagePullPolicy: Always
I'm using mpi-operator v1alpha2 with volcano v0.3, thanks.
Because it is missing resources in the launcher. Please check the code in volcano. Few of the resources will be seen as short.