SparkBWA
SparkBWA copied to clipboard
Can't run SparkBWA on Amazon EMR Yarn cluster
Hi,
Thanks for this repo.
I’m trying to run SparkBWA on Amazon EMR Yarn cluster, but I got many errors.
I wrote yarn
instead of yarn-cluster
and also I wrote the --deploy-mode cluster
Then, I got the following error:
[hadoop@ip-172-31-14-100 ~]$ spark-submit --class com.github.sparkbwa.SparkBWA --master yarn --deploy-mode cluster --driver-memory 1500m --executor-memory 10g --executor-cores 1 --verbose --num-executors 16 sparkbwa-1.0.jar -m -r -p --index /Data/HumanBase/hg38 -n 16 -w "-R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589" ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: spark.sql.warehouse.dir=*********(redacted)
Adding default property: spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
Adding default property: spark.history.fs.logDirectory=hdfs:///var/log/spark/apps
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.yarn.historyServer.address=ip-172-31-14-100.eu-west-2.compute.internal:18080
Adding default property: spark.stage.attempt.ignoreOnDecommissionFetchFailure=true
Adding default property: spark.driver.memory=11171M
Adding default property: spark.executor.instances=16
Adding default property: spark.default.parallelism=256
Adding default property: spark.resourceManager.cleanupExpiredHost=true
Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname -f)
Adding default property: spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
Adding default property: spark.master=yarn
Adding default property: spark.blacklist.decommissioning.timeout=1h
Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2
Adding default property: spark.executor.memory=10356M
Adding default property: spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
Adding default property: spark.executor.cores=8
Adding default property: spark.history.ui.port=18080
Adding default property: spark.blacklist.decommissioning.enabled=true
Adding default property: spark.decommissioning.timeout.threshold=20
Adding default property: spark.hadoop.yarn.timeline-service.enabled=false
Parsed arguments:
master yarn
deployMode cluster
executorMemory 10g
executorCores 1
totalExecutorCores null
propertiesFile /usr/lib/spark/conf/spark-defaults.conf
driverMemory 1500m
driverCores null
driverExtraClassPath /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
driverExtraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
driverExtraJavaOptions -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
supervise false
queue null
numExecutors 16
files null
pyFiles null
archives null
mainClass com.github.sparkbwa.SparkBWA
primaryResource file:/home/hadoop/sparkbwa-1.0.jar
name com.github.sparkbwa.SparkBWA
childArgs [-m -r -p --index /Data/HumanBase/hg38 -n 16 -w -R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589 ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589]
jars null
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf:
(spark.blacklist.decommissioning.timeout,1h)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.default.parallelism,256)
(spark.blacklist.decommissioning.enabled,true)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.driver.memory,1500m)
(spark.executor.memory,10356M)
(spark.executor.instances,16)
(spark.sql.warehouse.dir,*********(redacted))
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080)
(spark.eventLog.enabled,true)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.history.ui.port,18080)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.resourceManager.cleanupExpiredHost,true)
(spark.shuffle.service.enabled,true)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.master,yarn)
(spark.dynamicAllocation.enabled,true)
(spark.executor.cores,8)
(spark.decommissioning.timeout.threshold,20)
(spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--jar
file:/home/hadoop/sparkbwa-1.0.jar
--class
com.github.sparkbwa.SparkBWA
--arg
-m
--arg
-r
--arg
-p
--arg
--index
--arg
/Data/HumanBase/hg38
--arg
-n
--arg
16
--arg
-w
--arg
-R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589
--arg
ERR000589_1.filt.fastq
--arg
ERR000589_2.filt.fastq
--arg
Output_ERR000589
System properties:
(spark.blacklist.decommissioning.timeout,1h)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.default.parallelism,256)
(spark.blacklist.decommissioning.enabled,true)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.driver.memory,1500m)
(spark.executor.memory,10g)
(spark.executor.instances,16)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.sql.warehouse.dir,*********(redacted))
(spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080)
(spark.eventLog.enabled,true)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.history.ui.port,18080)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(SPARK_SUBMIT,true)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.app.name,com.github.sparkbwa.SparkBWA)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.shuffle.service.enabled,true)
(spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.submit.deployMode,cluster)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.master,yarn)
(spark.dynamicAllocation.enabled,true)
(spark.decommissioning.timeout.threshold,20)
(spark.executor.cores,1)
(spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
Classpath elements:
file:/home/hadoop/sparkbwa-1.0.jar
18/01/20 15:53:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/01/20 15:53:13 INFO RMProxy: Connecting to ResourceManager at ip-172-31-14-100.eu-west-2.compute.internal/172.31.14.100:8032
18/01/20 15:53:13 INFO Client: Requesting a new application from cluster with 16 NodeManagers
18/01/20 15:53:13 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
18/01/20 15:53:13 INFO Client: Will allocate AM container, with 1884 MB memory including 384 MB overhead
18/01/20 15:53:13 INFO Client: Setting up container launch context for our AM
18/01/20 15:53:13 INFO Client: Setting up the launch environment for our AM container
18/01/20 15:53:13 INFO Client: Preparing resources for our AM container
18/01/20 15:53:14 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/01/20 15:53:16 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_libs__3181673287761365885.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_libs__3181673287761365885.zip
18/01/20 15:53:17 INFO Client: Uploading resource file:/home/hadoop/sparkbwa-1.0.jar -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/sparkbwa-1.0.jar
18/01/20 15:53:17 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_conf__4991143839440201874.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_conf__.zip
18/01/20 15:53:17 INFO SecurityManager: Changing view acls to: hadoop
18/01/20 15:53:17 INFO SecurityManager: Changing modify acls to: hadoop
18/01/20 15:53:17 INFO SecurityManager: Changing view acls groups to:
18/01/20 15:53:17 INFO SecurityManager: Changing modify acls groups to:
18/01/20 15:53:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/01/20 15:53:17 INFO Client: Submitting application application_1516463115359_0001 to ResourceManager
18/01/20 15:53:18 INFO YarnClientImpl: Submitted application application_1516463115359_0001
18/01/20 15:53:19 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:19 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1516463597765
final status: UNDEFINED
tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:20888/proxy/application_1516463115359_0001/
user: hadoop
18/01/20 15:53:20 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:21 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:22 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:23 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:24 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:25 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:26 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:27 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:28 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:29 INFO Client: Application report for application_1516463115359_0001 (state: FAILED)
18/01/20 15:53:29 INFO Client:
client token: N/A
diagnostics: Application application_1516463115359_0001 failed 2 times due to AM Container for appattempt_1516463115359_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1516463115359_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1516463597765
final status: FAILED
tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1516463115359_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/01/20 15:53:29 INFO ShutdownHookManager: Shutdown hook called
18/01/20 15:53:29 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a
[hadoop@ip-172-31-14-100 ~]$
Broadcast message from root@ip-172-31-14-100
(unknown) at 15:54 ...
The system is going down for power off NOW!
Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed by remote host.
Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed.
Any help will be appropriated
Thank you 🙏
any word on this ?