volcano icon indicating copy to clipboard operation
volcano copied to clipboard

spark operator do not use user define volcano queue

Open nolimitkun opened this issue 4 years ago • 8 comments

What happened: I follow this doc to integrate spark operator with volcano: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md

I add options to make spark application use volcano queue 'test' which has been created before:

batchSchedulerOptions: 
    queue: "test"

but test queue was not used. What you expected to happen: expect spark application use queue 'test'

How to reproduce it (as minimally and precisely as possible):

  1. install volcano:
git clone https://github.com/volcano-sh/volcano.git
cd volcano
k create ns volcano
helm install my-volcano helm/chart/volcano --namespace volcano -f helm/chart/volcano/values.yaml
  1. install spark operator:
k create ns spark-operator
helm install spark-operator  spark-operator/spark-operator --namespace spark-operator --set sparkJobNamespace=default  --set webhook.enable=true --set enableBatchScheduler=true  --set metrics.enable=true   --set image.tag=v1beta2-1.2.3-3.1.1
  1. create a test queue : https://volcano.sh/en/docs/tutorials/#step-1

  2. create a spark application with test queue kubectl apply -f myvolcano.yaml

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v3.1.1"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
  sparkVersion: "3.1.1"
  batchScheduler: "volcano"   #Note: the batch scheduler name must be specified with `volcano`
  batchSchedulerOptions: 
    queue: "test"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"        
    labels:
      version: 3.1.1
    serviceAccount: spark-operator-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"    
    labels:
      version: 3.1.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  1. check wchich queue/podgroup job use
kubectl get  podgroup podgroup-80c5b19e-8f4a-4bcd-b459-7fa6d3e22189 -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2021-05-09T14:13:56Z"
  generation: 8
  managedFields:
  - apiVersion: scheduling.volcano.sh/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .: {}
          k:{"uid":"80c5b19e-8f4a-4bcd-b459-7fa6d3e22189"}:
            .: {}
            f:apiVersion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        .: {}
        f:minMember: {}
      f:status: {}
    manager: vc-controller-manager
    operation: Update
    time: "2021-05-09T14:13:56Z"
  - apiVersion: scheduling.volcano.sh/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions: {}
        f:phase: {}
    manager: vc-scheduler
    operation: Update
    time: "2021-05-09T14:14:28Z"
  name: podgroup-80c5b19e-8f4a-4bcd-b459-7fa6d3e22189
  namespace: default
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Pod
    name: spark-pi-driver
    uid: 80c5b19e-8f4a-4bcd-b459-7fa6d3e22189
  resourceVersion: "1295426"
  uid: 8bc13dd6-0073-42c1-a916-2e07b2be2a17
spec:
  minMember: 1
status:
  conditions:
  - lastTransitionTime: "2021-05-09T14:14:27Z"
    reason: tasks in gang are ready to be scheduled
    status: "True"
    transitionID: 24a8864f-7615-4eeb-8c33-d101a289d287
    type: Scheduled
  - lastTransitionTime: "2021-05-09T14:14:29Z"
    message: '1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable.'
    reason: NotEnoughResources
    status: "True"
    transitionID: 5b627c85-68ef-424f-a9c5-0aa205ad0154
    type: Unschedulable
  phase: Inqueue

Anything else we need to know?:

Environment:

  • Volcano Version: latest
  • Kubernetes version (use kubectl version): minikube version minikube version: v1.17.1
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):macos Darwin 18.7.0 Darwin Kernel Version 18.7.0
  • Install tools:
  • Others:

nolimitkun avatar May 11 '21 12:05 nolimitkun

/assign @Thor-wl

Thor-wl avatar May 12 '21 03:05 Thor-wl

@nolimitkun
hi, spark-operator will create a new podgroup for you L120-L151 , podgroup name is like "spark-%s-pg" , and will cleanup the podgroup after task finished L164-L171 . so, if you want to check if spark-operator volcano integration is correct, you should to check your podgroup spec.queue, in your example, the podgroup should be spark-spark-pi-pg. remember, it will be deleted by spark-operator after your task finished.

lx1036 avatar Jun 28 '21 07:06 lx1036

@Thor-wl Please reproduce the issue according to the information.

william-wang avatar Jun 29 '21 01:06 william-wang

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Sep 27 '21 05:09 stale[bot]

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar Nov 26 '21 05:11 stale[bot]

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Feb 24 '22 12:02 stale[bot]

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar Apr 27 '22 08:04 stale[bot]

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Jul 30 '22 18:07 stale[bot]

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar Oct 01 '22 00:10 stale[bot]