Composer_dags.py script (pause & unpause dags) is taking hours to finish
In which file did you encounter the issue?
composer/tools/composer_dags.py
Did you change the file? If so, how?
No change
Describe the issue
I run the script for the environment of 500 dags. It is extremely slow, it takes more than 1-2min to do this for each dag. I has taken almost entire day and not yet finished. Is there a better way to batch those requests and pause multiple dags at once? This has to be done before the snapshot can be created.
Example logs: 2023-10-25 15:49:30,718 - Executing shell command: CLOUDSDK_API_ENDPOINT_OVERRIDES_COMPOSER=https://composer.googleapis.com/ gcloud composer environments run composer-dev --project=whejna-dataops --location=europe-west3 dags pause -- export_task_reschedule_v1_0 Executing the command: [ airflow dags pause export_task_reschedule_v1_0 ]... Command has been started. execution_id=3fa888b4-7cf4-4c0e-ab07-c54cccda24ac Use ctrl-c to interrupt the command 2023-10-25 15:51:07,371 - Dag: export_task_reschedule_v1_0, paused: True
Thanks
I encountered the same issue, and I assume that there's some overhead involved in authenticating to the Cloud Composer's GKE. It seems that the best way to handle it for now would be to run an SQL query for bulk update within Airflow's internal database. I also raised an issue in Airflow to extend Airflow's CLI abilities to support bulk pause/resume, you could follow here: https://github.com/apache/airflow/issues/36956
Created a PR in Airflow's repository for un/pausing multiple DAGs support as part of its CLI: https://github.com/apache/airflow/pull/38265
It could take a while until it's merged and a bit longer until the Cloud Composer image eventually supports it. Nevertheless, it would be nice to solve the overhead issue from Cloud Composer's CLI side.
Created a PR in Airflow's repository for un/pausing multiple DAGs support as part of its CLI: apache/airflow#38265
It could take a while until it's merged and a bit longer until the Cloud Composer image eventually supports it. Nevertheless, it would be nice to solve the overhead issue from Cloud Composer's CLI side.
Thank you! Your changes have been published and I'm using them in the following PR: https://github.com/GoogleCloudPlatform/python-docs-samples/pull/13593