docker-spark-cluster
docker-spark-cluster copied to clipboard
job not started
Hello
When a run a job i'm getting :
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I've got the same error.
command: docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b3358be58cce spydernaz/spark-worker:latest "/bin/bash /start-wo…" 13 minutes ago Up 13 minutes 8081/tcp docker-spark-cluster_spark-worker_1
08e637ed0527 spydernaz/spark-worker:latest "/bin/bash /start-wo…" 13 minutes ago Up 13 minutes 8081/tcp docker-spark-cluster_spark-worker_2
2f3606bea6da spydernaz/spark-worker:latest "/bin/bash /start-wo…" 13 minutes ago Up 13 minutes 8081/tcp docker-spark-cluster_spark-worker_3
0bbed01935dc spydernaz/spark-master:latest "/bin/bash /start-ma…" 13 minutes ago Up 13 minutes 6066/tcp, 0.0.0.0:7077->7077/tcp, 0.0.0.0:9090->8080/tcp docker-spark-cluster_spark-master_1
input: 1.py
from pyspark import SparkContext
logFile = "README.md"
spark = SparkContext('spark://10.68.50.149:7077','SimpleApp')
logData = spark.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
spark.stop()
output
19/11/21 08:00:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[Stage 0:> (0 + 0) / 2]19/11/21 08:00:29 WARN
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I'm curious if either of you fellows got this going, i'm particularly interested in the submit command for the pyspark script
I am fasing issue job are failed when i use this script
/opt/spark/bin/spark-submit --master spark://spark-master:7077
--jars /opt/spark-apps/postgresql-42.2.22.jar
--driver-memory 1G
--executor-memory 1G
/opt/spark-apps/main.py
can anyone help
@javed2005 the error message said that you do not have the CSV file, because the owner has used the gitignore to filter the CSV file. so you should create a data folder and download the first CSV file here, unzip the file. http://web.mta.info/developers/MTA-Bus-Time-historical-data.html