scalable-airflow-template
scalable-airflow-template copied to clipboard
fast and scalable Airflow on Kubernetes Setup.
Scalable Airflow Setup Template
This repo's goal is to get you going fast and scalable with your Airflow on Kubernetes Setup.
Features
:baby: Easy Setup: Using cookiecutter to fill in the blanks.
:fire: Disposable Infrastructure: Using helm and some premade commands, we can destroy and re-deploy the entire infrastructure easily.
:rocket: Cost-Efficient: We use kubernetes as the tasks' engine. Airflow scheduler will run each task on a new pod and delete it upon completion. Allowing us to scale according to workload using the minimal amount of resources.
:nut_and_bolt: Decoupled Executor: Another great advantage of using Kubernetes as the task runner is - decoupling orchestration from execution. You can read more about it in We're All Using Airflow Wrong and How to Fix It.
:runner: Dynamically Updated Workflows: We use Git-Sync containers. Those will allow us to update the workflows using git alone. No need to redeploy Airflow on each workflow change.
Installation
$ cookiecutter https://github.com/talperetz/scalable-airflow-template
Cookicutter Options Explained
- airflow_executor: You can use Kubernetes for execution with both Celery and Kubernetes as executors. To learn more checkout Scale Your Data Pipelines with Airflow and Kubernetes
- local_airflow_image_name: image name. required if you want to build your own Airflow image.
- airflow_image_repository: ECR repository link. required if you want to build your own Airflow image.
- git_repo_to_sync_dags: link to the scalable_airflow repository with your new workflows on github.
- git_username_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_username" | base64
- git_password_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_password" | base64
- fernet_key: You can fill fernet_key option with the response from this command:
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
Usage
Prerequisites
$ brew install kubectl
$ brew install helm
- make sure your kubectl context is configured to your EKS cluster.
for custom Airflow image you'll also need:
Kubernetes cluster set with autoscaler
ECR Repository for the docker image
It is also recommended to set up Kubernetes Dashboard
Default Airflow Image
$ make deploy
At this point you should see the stack deployed to kubernetes.
To see Airflow's UI:
$ make ui pod=[webserver-pod-name]
Custom Airflow Image
After changing the config/docker/Dockerfile and scripts/entrypoint.sh
Build your custom airflow image
$ make build
Push to ECR
$ make push
Deploy to Kubernetes
$ make deploy
To see Airflow's UI:
$ make ui pod=[webserver-pod-name]
Fine Tuning The Setup
This template uses:
Airflow Helm Chart: Airflow stable helm chart
Docker Image: https://github.com/puckel/docker-airflow
for more details and fine tuning of the setup please refer to the links above.