volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Volcano Queue Issues.

Open bubbybharath opened this issue 2 years ago • 14 comments

Hi Team,

I have created a queue and podgroup, added --set batchScheduler.enable=true during spark operator deployment through helm. However, when i am testing a spark operator job, it is using the default queue and podgroup is generated with random values. Please let me know the process to use a custom queue and podgroup in spark operator.

podgroup-152df854-0b76-43d2-b4d3-9229ba1915e9 podgroup-6be9ba51-1579-4555-bc4c-efa76ac87d2d

  1. Which version of spark operator support volcano? I tried using 3.3.2 i am not able to spin up the driver.. When I tried downloading spark operator 3.5.0 it looks good. However, the issue is with using custom queue and podgroup.

bubbybharath avatar Oct 23 '23 16:10 bubbybharath

@Yikun

Please check this

hwdef avatar Oct 24 '23 02:10 hwdef

I am not very familar with spark operators, you can see [1] as reference. cc @william-wang know more.

See also Apache Spark native vocano support: https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-volcano-as-customized-scheduler-for-spark-on-kubernetes can specify the podgroup.

[1] https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md

Yikun avatar Oct 24 '23 03:10 Yikun

@bubbybharath It's reasonable to support specify the custom queue, Would you like to tell us more about your scenario and use case?

william-wang avatar Oct 28 '23 03:10 william-wang

Hi,

I have attached the document here. My requirement is to create a custom queue and podgroup and use it to spin up driver pod. I tried creating podgroup within namespace and but it is randomly generating a pod group name with default queue. Then I have started creating podgroup in the minio bucket as podgroup template. Still it is not being picked up.

  1. how to check if my podgroup is being used for the particular spark job? The moment I see a driver spinned up, I also see a podgroup being created with a dynamic name. I want to use the custom queue along with the pod group.
  2. which spark operator version is compatible with volcano? I tried using 3.3.2 but not working. When I used 3.5.0 it is working. Kindly assist me.

On Sat, 28 Oct 2023, 08:40 william-wang, @.***> wrote:

@bubbybharath https://github.com/bubbybharath It's reasonable to support specify the custom queue, Would you like to tell us more about your scenario and use case?

— Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/3163#issuecomment-1783678607, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARXUNANCEKXQV4X4FIRGI53YBRZSFAVCNFSM6AAAAAA6MMHOGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTGY3TQNRQG4 . You are receiving this because you were mentioned.Message ID: @.***>

bubbybharath avatar Oct 28 '23 12:10 bubbybharath

VolcanoIssues.docx

I have added the steps where I am unable to use custom queue and podgroup.

bubbybharath avatar Oct 30 '23 12:10 bubbybharath

@william-wang , please assist me

bubbybharath avatar Oct 30 '23 15:10 bubbybharath

as far as I know, you can try to set SparkApplicationSpec.Spec.BatchSchedulerOptions.Queue field of SparkApplication to specify your custom queue name.

Monokaix avatar Nov 03 '23 08:11 Monokaix

we have added the below string in sparkapplication but no luck. Podgroup is using the default queue still. I have defined podgroup in the spark job namespace. We have separate namespaces for spark operator and spark job namespaces. When we comment batchscheduler:volcano and disable batchscheduler in spark operator deployment, the same yaml file works.

sparkVersion: "3.5.0" batchScheduler: "volcano" batchSchedulerOptions: queue: "test-queue"

bubbybharath avatar Nov 06 '23 12:11 bubbybharath

Hi Team,

I am also trying to test volcano through spark submit command, getting the below error.

Warning SparkApplicationFailed 2m30s spark-operator SparkApplication sparkway-benchmark-testing failed: failed to run spark-submit for SparkApplication scp-corp-datahub-spark-ns/sparkway-benchmark-testing: 23/11/14 17:00:23 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file Exception in thread "main" java.lang.ClassNotFoundException: org.apache.spark.deploy.k8s.features.VolcanoFeatureStep

I have added the below files as listed.

spark.kubernetes.scheduler.volcano.podGroupTemplateFile=s3a://scp-corp-etax/podgroups/test_podgroup.yaml spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep,

bubbybharath avatar Nov 14 '23 17:11 bubbybharath

I have noticed similar issue with spark submit job. It sends the job to the default queue no matter what. Looks like its not reading "spark.kubernetes.scheduler.volcano.podGroupTemplateFile" at all, because even when random name of a file is specified (a file that doesn't exist), no error is seen.

apinchuk1 avatar Nov 18 '23 00:11 apinchuk1

@william-wang @Thor-wl I see there is similiar thread from 2021 https://github.com/volcano-sh/volcano/issues/1462. Does volcano support custom queue for spark operator? I am seeing the same problem with spark operator, specifying custom queue, however jobs keep running in a default queue

apinchuk1 avatar Jan 03 '24 17:01 apinchuk1

Hi, currently trying to use spark (3.5.0) with volcano but could not make my jobs reach custom queues, neither using spark-operator nor spark-submit command. Could someone make it work already ? With spark-operator, I don't face any error but volcano simply ignore the custom queue requirement, whereas with spark-submit, I also face the ClassNotFoundException about "VolcanoFeatureStep"... Doesn't seem very mature stuff...

schauaib avatar Jan 18 '24 10:01 schauaib

Hi, currently trying to use spark (3.5.0) with volcano but could not make my jobs reach custom queues, neither using spark-operator nor spark-submit command. Could someone make it work already ? With spark-operator, I don't face any error but volcano simply ignore the custom queue requirement, whereas with spark-submit, I also face the ClassNotFoundException about "VolcanoFeatureStep"... Doesn't seem very mature stuff...

Hi, please refer to https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-volcano-as-customized-scheduler-for-spark-on-kubernetes to submit your spark job, currently, set custom podgroup and queue is supported by spark-submit. And welcome to paste you spark config and error log if you have any problem.

Monokaix avatar Jan 29 '24 03:01 Monokaix

Thanks for your help. I finally made it work. I hadn't figured out that I had to rebuild a spark image to comply with Volcano. Now this seems ok

schauaib avatar Jan 29 '24 07:01 schauaib

/close

Monokaix avatar Jul 17 '24 02:07 Monokaix

@Monokaix: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot avatar Jul 17 '24 02:07 volcano-sh-bot