spark operator do not use user define volcano queue
What happened: I follow this doc to integrate spark operator with volcano: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
I add options to make spark application use volcano queue 'test' which has been created before:
batchSchedulerOptions:
queue: "test"
but test queue was not used. What you expected to happen: expect spark application use queue 'test'
How to reproduce it (as minimally and precisely as possible):
- install volcano:
git clone https://github.com/volcano-sh/volcano.git
cd volcano
k create ns volcano
helm install my-volcano helm/chart/volcano --namespace volcano -f helm/chart/volcano/values.yaml
- install spark operator:
k create ns spark-operator
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --set sparkJobNamespace=default --set webhook.enable=true --set enableBatchScheduler=true --set metrics.enable=true --set image.tag=v1beta2-1.2.3-3.1.1
-
create a test queue : https://volcano.sh/en/docs/tutorials/#step-1
-
create a spark application with test queue kubectl apply -f myvolcano.yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v3.1.1"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
sparkVersion: "3.1.1"
batchScheduler: "volcano" #Note: the batch scheduler name must be specified with `volcano`
batchSchedulerOptions:
queue: "test"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: spark-operator-spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.1.1
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
- check wchich queue/podgroup job use
kubectl get podgroup podgroup-80c5b19e-8f4a-4bcd-b459-7fa6d3e22189 -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
creationTimestamp: "2021-05-09T14:13:56Z"
generation: 8
managedFields:
- apiVersion: scheduling.volcano.sh/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:ownerReferences:
.: {}
k:{"uid":"80c5b19e-8f4a-4bcd-b459-7fa6d3e22189"}:
.: {}
f:apiVersion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
.: {}
f:minMember: {}
f:status: {}
manager: vc-controller-manager
operation: Update
time: "2021-05-09T14:13:56Z"
- apiVersion: scheduling.volcano.sh/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions: {}
f:phase: {}
manager: vc-scheduler
operation: Update
time: "2021-05-09T14:14:28Z"
name: podgroup-80c5b19e-8f4a-4bcd-b459-7fa6d3e22189
namespace: default
ownerReferences:
- apiVersion: v1
controller: true
kind: Pod
name: spark-pi-driver
uid: 80c5b19e-8f4a-4bcd-b459-7fa6d3e22189
resourceVersion: "1295426"
uid: 8bc13dd6-0073-42c1-a916-2e07b2be2a17
spec:
minMember: 1
status:
conditions:
- lastTransitionTime: "2021-05-09T14:14:27Z"
reason: tasks in gang are ready to be scheduled
status: "True"
transitionID: 24a8864f-7615-4eeb-8c33-d101a289d287
type: Scheduled
- lastTransitionTime: "2021-05-09T14:14:29Z"
message: '1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable.'
reason: NotEnoughResources
status: "True"
transitionID: 5b627c85-68ef-424f-a9c5-0aa205ad0154
type: Unschedulable
phase: Inqueue
Anything else we need to know?:
Environment:
- Volcano Version: latest
- Kubernetes version (use
kubectl version): minikube version minikube version: v1.17.1 - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a):macos Darwin 18.7.0 Darwin Kernel Version 18.7.0 - Install tools:
- Others:
/assign @Thor-wl
@nolimitkun
hi, spark-operator will create a new podgroup for you L120-L151 , podgroup name is like "spark-%s-pg" , and will cleanup the podgroup after task finished L164-L171 .
so, if you want to check if spark-operator volcano integration is correct, you should to check your podgroup spec.queue, in your example, the podgroup should be spark-spark-pi-pg. remember, it will be deleted by spark-operator after your task finished.
@Thor-wl Please reproduce the issue according to the information.
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗