initialization-actions icon indicating copy to clipboard operation
initialization-actions copied to clipboard

[spark-rapids] generate spark-rapids/spark-rapids.sh from template

Open cjac opened this issue 1 year ago • 62 comments

This is a re-implementation of the script using templates created while re-factoring common code under gpu/, dask/, rapids/, spark-rapids/, horovod/, mlvm/ and many of the other initialization actions.

For templates used to generate the new spark-rapids.sh and mig.sh PTAL at

https://github.com/GoogleCloudDataproc/initialization-actions/pull/1282/files#diff-9887ca799a0fd5f78f754eb371e1f771c3e80d5abc06ee6d17d54fd2ad2962a3

cjac avatar Dec 25 '24 21:12 cjac

/gcbrun

cjac avatar Dec 25 '24 21:12 cjac

/gcbrun

cjac avatar Dec 25 '24 22:12 cjac

/gcbrun

cjac avatar Dec 26 '24 02:12 cjac

/gcbrun

cjac avatar Dec 26 '24 03:12 cjac

/gcbrun

cjac avatar Dec 26 '24 04:12 cjac

/gcbrun

cjac avatar Dec 26 '24 05:12 cjac

/gcbrun

cjac avatar Dec 26 '24 21:12 cjac

/gcbrun

cjac avatar Dec 27 '24 02:12 cjac

/gcbrun

cjac avatar Dec 27 '24 04:12 cjac

/gcbrun

cjac avatar Dec 27 '24 05:12 cjac

/gcbrun

cjac avatar Dec 27 '24 07:12 cjac

/gcbrun

cjac avatar Dec 27 '24 09:12 cjac

okay, that brings the new code up to the standard of the previous code. Now let's start enabling more tests...

cjac avatar Dec 27 '24 09:12 cjac

/gcbrun

cjac avatar Dec 27 '24 09:12 cjac

/gcbrun

cjac avatar Dec 27 '24 09:12 cjac

/gcbrun

cjac avatar Dec 27 '24 10:12 cjac

/gcbrun

cjac avatar Dec 27 '24 10:12 cjac

okay so now we have some rocky8 and rocky9 coverage. I don't think we have to disable the 2.0 images. Let's verify...

cjac avatar Dec 27 '24 22:12 cjac

/gcbrun

cjac avatar Dec 28 '24 01:12 cjac

okay, now it:

  • meets the expectations of the previous implementation
  • does not disable rocky tests

Let's try skipping fewer tests still

cjac avatar Dec 28 '24 01:12 cjac

/gcbrun

cjac avatar Dec 28 '24 01:12 cjac

/gcbrun

cjac avatar Dec 28 '24 03:12 cjac

I think we've got single node clusters working fine on rocky again.

cjac avatar Dec 28 '24 05:12 cjac

/gcbrun

cjac avatar Dec 28 '24 05:12 cjac

can we skip no tests?

cjac avatar Dec 28 '24 06:12 cjac

/gcbrun

cjac avatar Dec 28 '24 06:12 cjac

/gcbrun

cjac avatar Dec 28 '24 07:12 cjac

/gcbrun

cjac avatar Dec 28 '24 08:12 cjac

/gcbrun

cjac avatar Dec 28 '24 09:12 cjac

Failing on 2.1-rocky8:

gcloud compute ssh test-rapids-single-2-1-20241228-090957-ax78-m --zone=us-central1-f --command="echo :quit | spark-shell          --conf spark.executor.resource.gpu.amount=1          --conf spark.task.resource.gpu.amount=0.1          --conf spark.dynamicAllocation.enabled=false -i verify_xgboost_spark_rapids.scala"
...
2024-12-28T09:16:40.181635368Z 24/12/28 09:16:38 ERROR SparkContext: Error initializing SparkContext.
2024-12-28T09:16:40.181644485Z org.apache.spark.SparkException: Application application_1735377343997_0001 failed 2 times due to AM Container for appattempt_1735377343997_0001_000002 exited with  exitCode: -1
2024-12-28T09:16:40.181652729Z Failing this attempt.Diagnostics: [2024-12-28 09:16:37.138]ResourceHandlerChain.preStart() failed!
2024-12-28T09:16:40.181661528Z [2024-12-28 09:16:37.138]ResourceHandlerChain.preStart() failed!
2024-12-28T09:16:40.181669571Z For more detailed output, check the application tracking page: http://test-rapids-single-2-1-20241228-090957-ax78-m:8188/applicationhistory/app/application_1735377343997_0001 Then click on links to logs of each attempt.
...
2024-12-28T09:16:40.182246987Z 24/12/28 09:16:38 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to send shutdown message before the AM has registered!
2024-12-28T09:16:40.182254360Z 24/12/28 09:16:38 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
2024-12-28T09:16:40.182262098Z 24/12/28 09:16:38 WARN MetricsSystem: Stopping a MetricsSystem that is not running
2024-12-28T09:16:40.182269324Z 24/12/28 09:16:38 ERROR Main: Failed to initialize Spark session.

2024-12-28T09:16:40.172463304Z INFO: From Testing //:test_spark_rapids (shard 2 of 3):
2024-12-28T09:16:40.181205242Z ==================== Test output for //:test_spark_rapids (shard 2 of 3):
2024-12-28T09:16:40.181259588Z Running tests under Python 3.10.12: /usr/bin/python3
2024-12-28T09:16:40.181271199Z [ RUN      ] SparkRapidsTestCase.test_spark_rapids('SINGLE', ['m'], 'type=nvidia-tesla-t4')
2024-12-28T09:16:40.181279535Z [  FAILED  ] SparkRapidsTestCase.test_spark_rapids('SINGLE', ['m'], 'type=nvidia-tesla-t4')
2024-12-28T09:16:40.181289052Z ======================================================================
2024-12-28T09:16:40.181297700Z FAIL: test_spark_rapids('SINGLE', ['m'], 'type=nvidia-tesla-t4') (__main__.SparkRapidsTestCase)
2024-12-28T09:16:40.181305356Z test_spark_rapids('SINGLE', ['m'], 'type=nvidia-tesla-t4') (__main__.SparkRapidsTestCase)
2024-12-28T09:16:40.181313086Z test_spark_rapids('SINGLE', ['m'], 'type=nvidia-tesla-t4')
2024-12-28T09:16:40.181321220Z ----------------------------------------------------------------------
2024-12-28T09:16:40.181329465Z Traceback (most recent call last):
2024-12-28T09:16:40.181339430Z   File "/home/ia-tests/.cache/bazel/_bazel_ia-tests/83b1ae36bb04ea5432b9efccee83c25f/execroot/_main/bazel-out/k8-fastbuild/bin/test_spark_rapids.runfiles/io_abseil_py/absl/testing/parameterized.py", line 265, in bound_param_test
2024-12-28T09:16:40.181350608Z     test_method(self, *testcase_params)
2024-12-28T09:16:40.181358406Z   File "/home/ia-tests/.cache/bazel/_bazel_ia-tests/83b1ae36bb04ea5432b9efccee83c25f/execroot/_main/bazel-out/k8-fastbuild/bin/test_spark_rapids.runfiles/_main/spark-rapids/test_spark_rapids.py", line 80, in test_spark_rapids
2024-12-28T09:16:40.181367792Z     self.verify_spark_job()
2024-12-28T09:16:40.181375727Z   File "/home/ia-tests/.cache/bazel/_bazel_ia-tests/83b1ae36bb04ea5432b9efccee83c25f/execroot/_main/bazel-out/k8-fastbuild/bin/test_spark_rapids.runfiles/_main/spark-rapids/test_spark_rapids.py", line 34, in verify_spark_job
2024-12-28T09:16:40.181386151Z     self.assert_instance_command(
2024-12-28T09:16:40.181393983Z   File "/home/ia-tests/.cache/bazel/_bazel_ia-tests/83b1ae36bb04ea5432b9efccee83c25f/execroot/_main/bazel-out/k8-fastbuild/bin/test_spark_rapids.runfiles/_main/integration_tests/dataproc_test_case.py", line 290, in assert_instance_command
2024-12-28T09:16:40.181401367Z     ret_code, stdout, stderr = self.assert_command(
2024-12-28T09:16:40.181408995Z   File "/home/ia-tests/.cache/bazel/_bazel_ia-tests/83b1ae36bb04ea5432b9efccee83c25f/execroot/_main/bazel-out/k8-fastbuild/bin/test_spark_rapids.runfiles/_main/integration_tests/dataproc_test_case.py", line 342, in assert_command
2024-12-28T09:16:40.181417121Z     self.assertEqual(
2024-12-28T09:16:40.181424369Z AssertionError: 1 != 0 : Failed to execute command:
2024-12-28T09:16:40.181451032Z gcloud compute ssh test-rapids-single-2-1-20241228-090957-ax78-m --zone=us-central1-f --command="echo :quit | spark-shell          --conf spark.executor.resource.gpu.amount=1          --conf spark.task.resource.gpu.amount=0.1          --conf spark.dynamicAllocation.enabled=false -i verify_xgboost_spark_rapids.scala"
2024-12-28T09:16:40.181488519Z STDOUT:

2024-12-28T09:16:40.181495772Z 
2024-12-28T09:16:40.181503201Z STDERR:
2024-12-28T09:16:40.181511070Z Setting default log level to "WARN".
2024-12-28T09:16:40.181518588Z To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2024-12-28T09:16:40.181527998Z 24/12/28 09:16:26 WARN ResourceUtils: The configuration of cores (exec = 24 task = 2, runnable tasks = 12) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 10. Please adjust your configuration.
2024-12-28T09:16:40.181535401Z 24/12/28 09:16:26 INFO SparkEnv: Registering MapOutputTracker
2024-12-28T09:16:40.181542402Z 24/12/28 09:16:26 INFO SparkEnv: Registering BlockManagerMaster
2024-12-28T09:16:40.181549576Z 24/12/28 09:16:27 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
2024-12-28T09:16:40.181556968Z 24/12/28 09:16:27 INFO SparkEnv: Registering OutputCommitCoordinator
2024-12-28T09:16:40.181564719Z 24/12/28 09:16:28 WARN RapidsPluginUtils: RAPIDS Accelerator 23.08.2 using cudf 23.08.0.
2024-12-28T09:16:40.181573748Z 24/12/28 09:16:28 WARN RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 24.
2024-12-28T09:16:40.181583185Z 24/12/28 09:16:28 WARN RapidsPluginUtils: The current setting of spark.task.resource.gpu.amount (0.1) is not ideal to get the best performance from the RAPIDS Accelerator plugin. It's recommended to be 1/{executor core count} unless you have a special use case.
2024-12-28T09:16:40.181592461Z 24/12/28 09:16:28 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.
2024-12-28T09:16:40.181602487Z 24/12/28 09:16:28 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.
2024-12-28T09:16:40.181610414Z 24/12/28 09:16:38 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
2024-12-28T09:16:40.181635368Z 24/12/28 09:16:38 ERROR SparkContext: Error initializing SparkContext.
2024-12-28T09:16:40.181644485Z org.apache.spark.SparkException: Application application_1735377343997_0001 failed 2 times due to AM Container for appattempt_1735377343997_0001_000002 exited with  exitCode: -1
2024-12-28T09:16:40.181652729Z Failing this attempt.Diagnostics: [2024-12-28 09:16:37.138]ResourceHandlerChain.preStart() failed!
2024-12-28T09:16:40.181661528Z [2024-12-28 09:16:37.138]ResourceHandlerChain.preStart() failed!
2024-12-28T09:16:40.181669571Z For more detailed output, check the application tracking page: http://test-rapids-single-2-1-20241228-090957-ax78-m:8188/applicationhistory/app/application_1735377343997_0001 Then click on links to logs of each attempt.
2024-12-28T09:16:40.181677394Z . Failing the application.
2024-12-28T09:16:40.181685827Z 	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:98) ~[spark-yarn_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181693770Z 	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:65) ~[spark-yarn_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181701925Z 	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:234) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181719600Z 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:627) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181727298Z 	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2786) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181734874Z 	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181742350Z 	at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181749901Z 	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181757102Z 	at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.181764381Z 	at $line3.$read$$iw$$iw.<init>(<console>:15) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181772357Z 	at $line3.$read$$iw.<init>(<console>:42) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181779738Z 	at $line3.$read.<init>(<console>:44) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181787200Z 	at $line3.$read$.<init>(<console>:48) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181794677Z 	at $line3.$read$.<clinit>(<console>) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181802224Z 	at $line3.$eval$.$print$lzycompute(<console>:7) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181809839Z 	at $line3.$eval$.$print(<console>:6) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181817260Z 	at $line3.$eval.$print(<console>) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.181824582Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
2024-12-28T09:16:40.181832141Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
2024-12-28T09:16:40.181839610Z 	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
2024-12-28T09:16:40.181847607Z 	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
2024-12-28T09:16:40.181855210Z 	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181862897Z 	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181893089Z 	at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181912057Z 	at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) ~[scala-reflect-2.12.18.jar:?]
2024-12-28T09:16:40.181920096Z 	at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) ~[scala-reflect-2.12.18.jar:?]
2024-12-28T09:16:40.181927892Z 	at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) ~[scala-reflect-2.12.18.jar:?]
2024-12-28T09:16:40.181935752Z 	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181943725Z 	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181951477Z 	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181958961Z 	at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181966424Z 	at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:206) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181973890Z 	at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.181981267Z 	at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182010983Z 	at scala.collection.immutable.List.foreach(List.scala:431) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182019645Z 	at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182027121Z 	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182034037Z 	at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:97) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182041417Z 	at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182048680Z 	at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182056018Z 	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182063359Z 	at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:166) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182071177Z 	at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:206) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182078744Z 	at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:163) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182086459Z 	at org.apache.spark.repl.SparkILoop.loopPostInit$1(SparkILoop.scala:153) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182094268Z 	at org.apache.spark.repl.SparkILoop.$anonfun$process$10(SparkILoop.scala:221) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182101761Z 	at org.apache.spark.repl.SparkILoop.withSuppressedSettings$1(SparkILoop.scala:189) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182109137Z 	at org.apache.spark.repl.SparkILoop.startup$1(SparkILoop.scala:201) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182116475Z 	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:236) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182123995Z 	at org.apache.spark.repl.Main$.doMain(Main.scala:78) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182132121Z 	at org.apache.spark.repl.Main$.main(Main.scala:58) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182139564Z 	at org.apache.spark.repl.Main.main(Main.scala) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182147126Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
2024-12-28T09:16:40.182154501Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
2024-12-28T09:16:40.182161666Z 	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
2024-12-28T09:16:40.182169382Z 	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
2024-12-28T09:16:40.182176797Z 	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182184379Z 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:973) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182192149Z 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182200566Z 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182208136Z 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182216103Z 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1061) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182232195Z 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1070) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182239670Z 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182246987Z 24/12/28 09:16:38 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to send shutdown message before the AM has registered!
2024-12-28T09:16:40.182254360Z 24/12/28 09:16:38 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
2024-12-28T09:16:40.182262098Z 24/12/28 09:16:38 WARN MetricsSystem: Stopping a MetricsSystem that is not running
2024-12-28T09:16:40.182269324Z 24/12/28 09:16:38 ERROR Main: Failed to initialize Spark session.
2024-12-28T09:16:40.182276819Z org.apache.spark.SparkException: Application application_1735377343997_0001 failed 2 times due to AM Container for appattempt_1735377343997_0001_000002 exited with  exitCode: -1
2024-12-28T09:16:40.182284436Z Failing this attempt.Diagnostics: [2024-12-28 09:16:37.138]ResourceHandlerChain.preStart() failed!
2024-12-28T09:16:40.182291579Z [2024-12-28 09:16:37.138]ResourceHandlerChain.preStart() failed!
2024-12-28T09:16:40.182299093Z For more detailed output, check the application tracking page: http://test-rapids-single-2-1-20241228-090957-ax78-m:8188/applicationhistory/app/application_1735377343997_0001 Then click on links to logs of each attempt.
2024-12-28T09:16:40.182307291Z . Failing the application.
2024-12-28T09:16:40.182327537Z 	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:98) ~[spark-yarn_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182335961Z 	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:65) ~[spark-yarn_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182343439Z 	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:234) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182350843Z 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:627) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182358329Z 	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2786) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182365122Z 	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182372615Z 	at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182379463Z 	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182387053Z 	at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182395386Z 	at $line3.$read$$iw$$iw.<init>(<console>:15) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182403151Z 	at $line3.$read$$iw.<init>(<console>:42) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182410790Z 	at $line3.$read.<init>(<console>:44) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182418594Z 	at $line3.$read$.<init>(<console>:48) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182426110Z 	at $line3.$read$.<clinit>(<console>) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182433602Z 	at $line3.$eval$.$print$lzycompute(<console>:7) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182441090Z 	at $line3.$eval$.$print(<console>:6) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182448898Z 	at $line3.$eval.$print(<console>) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182456547Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
2024-12-28T09:16:40.182464922Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
2024-12-28T09:16:40.182480678Z 	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
2024-12-28T09:16:40.182488370Z 	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
2024-12-28T09:16:40.182495675Z 	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182502973Z 	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182510262Z 	at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182517992Z 	at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) ~[scala-reflect-2.12.18.jar:?]
2024-12-28T09:16:40.182525406Z 	at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) ~[scala-reflect-2.12.18.jar:?]
2024-12-28T09:16:40.182532855Z 	at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) ~[scala-reflect-2.12.18.jar:?]
2024-12-28T09:16:40.182540263Z 	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182547943Z 	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182555378Z 	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182562238Z 	at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182569535Z 	at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:206) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182577148Z 	at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182585222Z 	at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182592570Z 	at scala.collection.immutable.List.foreach(List.scala:431) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182599865Z 	at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182607433Z 	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182614502Z 	at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:97) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182621685Z 	at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182641605Z 	at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182669811Z 	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.18.jar:?]
2024-12-28T09:16:40.182682305Z 	at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:166) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182689109Z 	at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:206) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182696021Z 	at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:163) ~[scala-compiler-2.12.18.jar:?]
2024-12-28T09:16:40.182702631Z 	at org.apache.spark.repl.SparkILoop.loopPostInit$1(SparkILoop.scala:153) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182710137Z 	at org.apache.spark.repl.SparkILoop.$anonfun$process$10(SparkILoop.scala:221) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182717463Z 	at org.apache.spark.repl.SparkILoop.withSuppressedSettings$1(SparkILoop.scala:189) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182732685Z 	at org.apache.spark.repl.SparkILoop.startup$1(SparkILoop.scala:201) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182740219Z 	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:236) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182747599Z 	at org.apache.spark.repl.Main$.doMain(Main.scala:78) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182755439Z 	at org.apache.spark.repl.Main$.main(Main.scala:58) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182762847Z 	at org.apache.spark.repl.Main.main(Main.scala) ~[spark-repl_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182770614Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
2024-12-28T09:16:40.182778597Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
2024-12-28T09:16:40.182785873Z 	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
2024-12-28T09:16:40.182793556Z 	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
2024-12-28T09:16:40.182801150Z 	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182808552Z 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:973) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182816237Z 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182824081Z 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182832044Z 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182839640Z 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1061) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182847380Z 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1070) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182855379Z 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.2.jar:3.3.2]
2024-12-28T09:16:40.182862581Z 
2024-12-28T09:16:40.182884975Z 
2024-12-28T09:16:40.182895223Z ----------------------------------------------------------------------
2024-12-28T09:16:40.183006263Z Ran 1 test in 412.500s
2024-12-28T09:16:40.183020028Z 
2024-12-28T09:16:40.183028507Z FAILED (failures=1)

cjac avatar Dec 29 '24 00:12 cjac