docker-airflow
docker-airflow copied to clipboard
ERROR - docker container failed: 'Error': None, 'StatusCode': 1
Hello,
I been searching in google for a couple of hours now but I cant find a workaround this error. I'm trying to use DockerOperator for airflow. DAG:
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
from airflow.operators.docker_operator import DockerOperator
import os
default_args = {
'owner' : 'airflow',
'description' : 'Use of the DockerOperator',
'depend_on_past' : False,
'start_date' : datetime(2018, 1, 3),
'email_on_failure' : False,
'email_on_retry' : False,
'retries' : 1,
'retry_delay' : timedelta(minutes=5)
}
with DAG('docker_dag', default_args=default_args, schedule_interval="* 1 * * *", catchup=False) as dag:
t1 = BashOperator(
task_id='print_current_date',
bash_command='date'
)
t2 = DockerOperator(
task_id='spark_submit',
image='jupyter/pyspark-notebook',
#image='jupyter/all-spark-notebook',
api_version='auto',
auto_remove=False,
docker_url="unix://var/run/docker.sock",
host_tmp_dir='/tmp',
tmp_dir='/tmp',
volumes=['/usr/local/airflow/scripts:/home/jovyan'],
command='spark-submit --master local[*] /home/jovyan/pyspark_test01.py'
)
t3 = BashOperator(
task_id='print_hello',
bash_command='echo "hello world"'
)
t1 >> t2 >> t3
Dag Log: (keeps failing with same error every time) dag_log.txt
docker-compose.yml
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
webserver:
#image: puckel/docker-airflow:1.10.9
image: puckel/docker-airflow
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- ./airflow/dags:/usr/local/airflow/dags
- ./airflow/plugins:/usr/local/airflow/plugins
- ./airflow/scripts:/usr/local/airflow/scripts
- ./requirements.txt:/requirements.txt
- '/var/run/docker.sock:/var/run/docker.sock'
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
and finally the script i'm trying to spark-submit:
import pyspark
spark = pyspark.sql.SparkSession.builder\
.appName('hogwarts')\
.getOrCreate()
characters = [
("Albus Dumbledore", 150),
("Minerva McGonagall", 70),
("Rubeus Hagrid", 63),
("Oliver Wood", 18),
("Harry Potter", 12),
("Ron Weasley", 12),
("Hermione", 13),
("Draco Malfoy", None)
]
c_df = spark.createDataFrame(characters, ["name", "age"])
c_df.show()
Any help would be greatly appreciated. I don't want to give up yet :)
i have the same issue did you solve it?
hey there! Any solution or idea? I am getting the same issue!