ci-cd-for-data-processing-workflow icon indicating copy to clipboard operation
ci-cd-for-data-processing-workflow copied to clipboard

Add solutiuon for running on a private ip Composer Cluster

Open jaketf opened this issue 5 years ago • 0 comments

Use Case:

Cloud Composer supports private ip clusters which spins up a private IP GKE cluster in the customer project within a VPC network, many customers have org policies or security practices of using only private ip GKE clusters, making this a popular feature. This deployment solution should not prevent users from using a private IP cluster.

Issue:

Using private IP causes gcloud composer environments run ... commands (which are used heavily in this solution to run Airflow CLI commands) to fail / timeout from cloud build.

Root Cause:

The Cloud Build Execution environment is serverless and does not run on the customer's network and therefore cannot reach the private ip GKE cluster when gcloud composer environments run runs kubectl under the hood to execute various airflow commands.

Potential Resolution

Consider redesign where the deploydags application is deployed as a Kubernetes Job on the Composer GKE cluster in the customer project and runs the airflow commands directly on the cluster (in a worker pod) rather than via the gcloud indirection.

Alternatives

Interact with Airflow only through REST API (public endpoint already secured with IAP). This interface is experimental / subject to change and currently being refactored in Airflow 2.0

jaketf avatar Feb 25 '20 17:02 jaketf