Volcano Queue Issues.
Hi Team,
I have created a queue and podgroup, added --set batchScheduler.enable=true during spark operator deployment through helm. However, when i am testing a spark operator job, it is using the default queue and podgroup is generated with random values. Please let me know the process to use a custom queue and podgroup in spark operator.
podgroup-152df854-0b76-43d2-b4d3-9229ba1915e9 podgroup-6be9ba51-1579-4555-bc4c-efa76ac87d2d
- Which version of spark operator support volcano? I tried using 3.3.2 i am not able to spin up the driver.. When I tried downloading spark operator 3.5.0 it looks good. However, the issue is with using custom queue and podgroup.
@Yikun
Please check this
I am not very familar with spark operators, you can see [1] as reference. cc @william-wang know more.
See also Apache Spark native vocano support: https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-volcano-as-customized-scheduler-for-spark-on-kubernetes can specify the podgroup.
[1] https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
@bubbybharath It's reasonable to support specify the custom queue, Would you like to tell us more about your scenario and use case?
Hi,
I have attached the document here. My requirement is to create a custom queue and podgroup and use it to spin up driver pod. I tried creating podgroup within namespace and but it is randomly generating a pod group name with default queue. Then I have started creating podgroup in the minio bucket as podgroup template. Still it is not being picked up.
- how to check if my podgroup is being used for the particular spark job? The moment I see a driver spinned up, I also see a podgroup being created with a dynamic name. I want to use the custom queue along with the pod group.
- which spark operator version is compatible with volcano? I tried using 3.3.2 but not working. When I used 3.5.0 it is working. Kindly assist me.
On Sat, 28 Oct 2023, 08:40 william-wang, @.***> wrote:
@bubbybharath https://github.com/bubbybharath It's reasonable to support specify the custom queue, Would you like to tell us more about your scenario and use case?
— Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/3163#issuecomment-1783678607, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARXUNANCEKXQV4X4FIRGI53YBRZSFAVCNFSM6AAAAAA6MMHOGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTGY3TQNRQG4 . You are receiving this because you were mentioned.Message ID: @.***>
@william-wang , please assist me
as far as I know, you can try to set SparkApplicationSpec.Spec.BatchSchedulerOptions.Queue field of SparkApplication to specify your custom queue name.
we have added the below string in sparkapplication but no luck. Podgroup is using the default queue still. I have defined podgroup in the spark job namespace. We have separate namespaces for spark operator and spark job namespaces. When we comment batchscheduler:volcano and disable batchscheduler in spark operator deployment, the same yaml file works.
sparkVersion: "3.5.0" batchScheduler: "volcano" batchSchedulerOptions: queue: "test-queue"
Hi Team,
I am also trying to test volcano through spark submit command, getting the below error.
Warning SparkApplicationFailed 2m30s spark-operator SparkApplication sparkway-benchmark-testing failed: failed to run spark-submit for SparkApplication scp-corp-datahub-spark-ns/sparkway-benchmark-testing: 23/11/14 17:00:23 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file Exception in thread "main" java.lang.ClassNotFoundException: org.apache.spark.deploy.k8s.features.VolcanoFeatureStep
I have added the below files as listed.
spark.kubernetes.scheduler.volcano.podGroupTemplateFile=s3a://scp-corp-etax/podgroups/test_podgroup.yaml spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep,
I have noticed similar issue with spark submit job. It sends the job to the default queue no matter what. Looks like its not reading "spark.kubernetes.scheduler.volcano.podGroupTemplateFile" at all, because even when random name of a file is specified (a file that doesn't exist), no error is seen.
@william-wang @Thor-wl I see there is similiar thread from 2021 https://github.com/volcano-sh/volcano/issues/1462. Does volcano support custom queue for spark operator? I am seeing the same problem with spark operator, specifying custom queue, however jobs keep running in a default queue
Hi, currently trying to use spark (3.5.0) with volcano but could not make my jobs reach custom queues, neither using spark-operator nor spark-submit command. Could someone make it work already ? With spark-operator, I don't face any error but volcano simply ignore the custom queue requirement, whereas with spark-submit, I also face the ClassNotFoundException about "VolcanoFeatureStep"... Doesn't seem very mature stuff...
Hi, currently trying to use spark (3.5.0) with volcano but could not make my jobs reach custom queues, neither using spark-operator nor spark-submit command. Could someone make it work already ? With spark-operator, I don't face any error but volcano simply ignore the custom queue requirement, whereas with spark-submit, I also face the ClassNotFoundException about "VolcanoFeatureStep"... Doesn't seem very mature stuff...
Hi, please refer to https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-volcano-as-customized-scheduler-for-spark-on-kubernetes to submit your spark job, currently, set custom podgroup and queue is supported by spark-submit. And welcome to paste you spark config and error log if you have any problem.
Thanks for your help. I finally made it work. I hadn't figured out that I had to rebuild a spark image to comply with Volcano. Now this seems ok
/close
@Monokaix: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.