observatory-platform icon indicating copy to clipboard operation
observatory-platform copied to clipboard

Refactor to support Astro

Open jdddog opened this issue 6 months ago • 1 comments

This PR refactors this project to enable our workflows to be deployed into an Astronomer.io Airflow environment:

  • Replaced Observatory API with a Python and BigQuery based Dataset API, to reduce the maintenance and infrastructure required to keep track of dataset ingest. This also means that we could remove the namespaced Python packages, just keeping a single observatory_platform package.
  • Move unit tests into the Python package under tests directories.
  • Re-arrange layout of modules into: airflow (Airflow related utilities), google (BigQuery, GCP, GCS and GKE utilities), sandbox (the sandbox testing environment), and then the rest of the code at the root of the project.
  • Removed several modules:
    • The Workflow classes as we are now using the Airflow TaskFlow API.
    • The cli, docker and terraform modules as we are now using Astro to deploy projects.
  • In the google module:
    • Added an optional client parameter to most functions, to allow a custom Google Cloud project to be specified.
    • Added a gcp and gke module with functions for creating and deleting GCP disks and Kuberentes volumes, to be used by workflows that use the Kubernetes decorator to run tasks.
  • Using a pyproject.toml file and setuptools_scm to manage the Python project and versioning instead of PBR.
  • Updated Read the Docs, removing most documentation apart from the automatically generated Python package documentation. Documentation about how the run the workflows in Astro will go into Gitbook.
  • Updated Github Actions workflows for compatibility with the above changes.

jdddog avatar Feb 25 '24 23:02 jdddog