scalable-airflow-template icon indicating copy to clipboard operation
scalable-airflow-template copied to clipboard

fast and scalable Airflow on Kubernetes Setup.

Scalable Airflow Setup Template

This repo's goal is to get you going fast and scalable with your Airflow on Kubernetes Setup.

Features

:baby: Easy Setup: Using cookiecutter to fill in the blanks.

:fire: Disposable Infrastructure: Using helm and some premade commands, we can destroy and re-deploy the entire infrastructure easily.

:rocket: Cost-Efficient: We use kubernetes as the tasks' engine. Airflow scheduler will run each task on a new pod and delete it upon completion. Allowing us to scale according to workload using the minimal amount of resources.

:nut_and_bolt: Decoupled Executor: Another great advantage of using Kubernetes as the task runner is - decoupling orchestration from execution. You can read more about it in We're All Using Airflow Wrong and How to Fix It.

:runner: Dynamically Updated Workflows: We use Git-Sync containers. Those will allow us to update the workflows using git alone. No need to redeploy Airflow on each workflow change.

Installation

$ cookiecutter https://github.com/talperetz/scalable-airflow-template

Cookicutter Options Explained

  • airflow_executor: You can use Kubernetes for execution with both Celery and Kubernetes as executors. To learn more checkout Scale Your Data Pipelines with Airflow and Kubernetes
  • local_airflow_image_name: image name. required if you want to build your own Airflow image.
  • airflow_image_repository: ECR repository link. required if you want to build your own Airflow image.
  • git_repo_to_sync_dags: link to the scalable_airflow repository with your new workflows on github.
  • git_username_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_username" | base64
  • git_password_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_password" | base64
  • fernet_key: You can fill fernet_key option with the response from this command:
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Usage

Prerequisites

$ brew install kubectl
$ brew install helm

for custom Airflow image you'll also need:
Kubernetes cluster set with autoscaler
ECR Repository for the docker image

It is also recommended to set up Kubernetes Dashboard

Default Airflow Image

$ make deploy

At this point you should see the stack deployed to kubernetes.
To see Airflow's UI:

$ make ui pod=[webserver-pod-name]

Custom Airflow Image

After changing the config/docker/Dockerfile and scripts/entrypoint.sh
Build your custom airflow image

$ make build

Push to ECR

$ make push

Deploy to Kubernetes

$ make deploy

To see Airflow's UI:

$ make ui pod=[webserver-pod-name]

Fine Tuning The Setup

This template uses:

Airflow Helm Chart: Airflow stable helm chart

Docker Image: https://github.com/puckel/docker-airflow

for more details and fine tuning of the setup please refer to the links above.