spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

SparkApplication is scheduled by k8s default-scheduler

Open iddelacruz opened this issue 4 years ago • 9 comments

Hi all.

I installed the operator and volcano. Both was installed with Helm 3. The installation was successful. Right now I'm trying to deploy a spark application taked from your examples.

  • k8s server version 1.20
  • Volcano version 1.1.2

I'm deploying, but when I enter:

kubectl describe [pod_name]

this is the result:

 Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    21m                default-scheduler  Successfully assigned ...

The spark application was scheduled by default_scheduler.

This is the file:


apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi-job
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "spark:test"
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar"
  sparkVersion: "3.0.1"
  batchScheduler: "volcano"
  batchSchedulerOptions:
    queue: "default"
    priorityClassName: "Normal"
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"        
    labels:
      version: 3.0.1
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 3
    memory: "512m"    
    labels:
      version: 3.0.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Later I installed a normal k8s pod and was scheduled correctly with volcano. Maybe I'm missing some annotations or some properties in the SparkApplication resource type.

Can you help me please. What I'm doing wrong.

iddelacruz avatar Feb 04 '21 09:02 iddelacruz

Volcano needs to be installed before you install Spark operator. Meanwhile, you can check whether you have configured Spark operator correctly in this doc.

diskun00 avatar Feb 11 '21 02:02 diskun00

I encounterred the same problem. Volcano was installed before installing Spark operator. I tested the sample SparkApplication in https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md. The pod schedulerName for spark-pi-driver is default-scheduler instead of volcano.

shouhong avatar Feb 22 '21 10:02 shouhong

Hi, the problem was the properties when I installed the operator. In the integration between volcano and spark operator manual, this is obsolete and the operator installation will be:

helm install spark-operator spark-operator/spark-operator --namespace spark-operator --set batchScheduler.enable=true --set webhook.enable=true

And thats it

iddelacruz avatar Mar 22 '21 08:03 iddelacruz

pec: containers:

  • args:
    • -logtostderr
    • -enable-webhook=true
    • -enable-batch-scheduler=true i have set batchscheduer.enble true,but,volcano is still not worked,, why?

leo7490 avatar Jul 27 '22 08:07 leo7490

我遇到了同样的问题。Volcano 是在安装 Spark 算子之前安装的。我在https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md中测试了示例 SparkApplication 。spark-pi-driver 的 pod schedulerName 是 default-scheduler 而不是volcano。

have you solve this problem?

leo7490 avatar Jul 27 '22 08:07 leo7490

have you solve this problem?

leo7490 avatar Jul 27 '22 08:07 leo7490

For me is ok thanks

iddelacruz avatar Oct 28 '22 19:10 iddelacruz

Anyone having issues, you need to pay attention to how you install the helm chart.

The volcano documentation syntax is wrong, they do not enable batchscheduler of the webhook in spark operator. The google documentation syntax is also wrong, probably due to a change in the heml values structure that did not get updated in the many documentation pages in github.

When you read the helm chart values you'll see that both batchScheduler and webhook have a member under them called enable that needs to be set to true.

lmouhib avatar May 12 '23 14:05 lmouhib