airflow-jupyter-docker-compose icon indicating copy to clipboard operation
airflow-jupyter-docker-compose copied to clipboard

Orchestration of data science and earth observation models in Apache Airflow, scale-up with Celery Executor, experiment with jupyter notebook using a docker containers composition


Orchestration of data science and earth observation models in Apache Airflow, scale-up with Celery Executor, experiment with jupyter notebook using a docker containers composition. Based on works.


  • docker-compose : 1.27.4

Commands to deploy and manage the stack behind an HTTPS automated proxy:

  • Ensure that appropriate DNS record for airflow base URL is created and resolve well.
  • Ensure that your automated nginx-proxy (e.g. ) is up and running.
  • Create the airflow-proxy network -> sudo docker network create airflow-proxy
  • Attach the new network to the existing nginx-proxy container to ensure proper proxy operations -> sudo docker network connect airflow-proxy <nginx-proxy container name>
  • Bring up the whole stack -> sudo docker-compose up -d --build

Stack management

  • Stop containers : sudo docker-compose down
  • View Container : sudo docker ps
  • Go inside a container : sudo docker-compose exec -it <service-id> bash
  • See logs of a container: sudo docker logs <service-id>
  • Monitor containers : sudo docker stats

Available URL list

  • airflow.<> -> airflow web UI
  • airflow.<>/flower -> Flower, celery workers Web UI
  • airflow.<>/pgadmin -> pgadmin4
  • airflow.<>/jupyter -> jupyter notebook (default password : notebook)

Deployed librairies

Please find below the included choice of librairies and associated reference URL for documentation and examples

Essential python librairies for data analysis

Library Desciption Resources
bokeh The Bokeh Visualization Library
bottleneck Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C. Working with pandas and xarray
dask Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
matplotlib-base Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
numpy The fundamental package for scientific computing with Python.
panel A high-level app and dashboarding solution for Python.
pytables PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
scipy SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular
scikit-image Image processing in Python
scikit-learn Machine Learning in Python
seaborn Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
xarray xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

Jupyter and Airflow framework specific librairies

Library Desciption Resources
papermill Papermill is a tool for parameterizing and executing Jupyter Notebooks.
psycopg2 Psycopg is the most popular PostgreSQL database adapter for the Python programming language.
  • JupyterLab extensions
Library Desciption Resources
appmode A Jupyter extensions that turns notebooks into web applications.
ipywidgets Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc.
ipyleaflet Interactive maps in the Jupyter notebook.
jupyterlab-manager A JupyterLab extension for Jupyter/IPython widgets.
jupyter_bokeh An extension for rendering Bokeh content in JupyterLab notebooks
jupyter-matplotlib An extension for rendering Matplotlib content in JupyterLab notebooks
jupyterlab-plotly The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.
jupyterlab-voyager JupyterLab extension visualize data with Voyager

Geo / EO / Weather specific

Library Desciption Resources
cartopy Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
cmocean This package contains colormaps for commonly-used oceanographic variables.
descartes Use Shapely or GeoJSON-like geometric objects as matplotlib paths and patches
ecwmf-api-client ECMWF WebAPI is a set of services developed by ECMWF to allow users from the outside to access some internal features and data of the centre.
iris A powerful, format-agnostic, community-driven Python library for analysing and visualising Earth science data.
iris-grib The library iris-grib provides functionality for converting between weather and climate datasets that are stored as GRIB files and Iris cubes.
geos GEOS (Geometry Engine - Open Source) is a C++ port of the ​JTS Topology Suite (JTS). It aims to contain the complete functionality of JTS in C++.
geopandas GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types.
metpy MetPy is a collection of tools in Python for reading, visualizing, and performing calculations with weather data.
metview Python interface to Metview meteorological workstation and batch system
magics Python interface to Magics meteorological plotting package.
netcdf4 netcdf4-python is a Python interface to the netCDF C library.
protobuf Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
pynio PyNIO is a multi-format data I/O package with a NetCDF-style interface.
shapely Manipulation and analysis of geometric objects in the Cartesian plane.
siphon A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server.