python-docs-samples icon indicating copy to clipboard operation
python-docs-samples copied to clipboard

Composer_dags.py script (pause & unpause dags) is taking hours to finish

Open hejnal opened this issue 2 years ago • 3 comments

In which file did you encounter the issue?

composer/tools/composer_dags.py

Did you change the file? If so, how?

No change

Describe the issue

I run the script for the environment of 500 dags. It is extremely slow, it takes more than 1-2min to do this for each dag. I has taken almost entire day and not yet finished. Is there a better way to batch those requests and pause multiple dags at once? This has to be done before the snapshot can be created.

Example logs: 2023-10-25 15:49:30,718 - Executing shell command: CLOUDSDK_API_ENDPOINT_OVERRIDES_COMPOSER=https://composer.googleapis.com/ gcloud composer environments run composer-dev --project=whejna-dataops --location=europe-west3 dags pause -- export_task_reschedule_v1_0 Executing the command: [ airflow dags pause export_task_reschedule_v1_0 ]... Command has been started. execution_id=3fa888b4-7cf4-4c0e-ab07-c54cccda24ac Use ctrl-c to interrupt the command 2023-10-25 15:51:07,371 - Dag: export_task_reschedule_v1_0, paused: True

Thanks

hejnal avatar Oct 25 '23 15:10 hejnal

I encountered the same issue, and I assume that there's some overhead involved in authenticating to the Cloud Composer's GKE. It seems that the best way to handle it for now would be to run an SQL query for bulk update within Airflow's internal database. I also raised an issue in Airflow to extend Airflow's CLI abilities to support bulk pause/resume, you could follow here: https://github.com/apache/airflow/issues/36956

shahar1 avatar Jan 23 '24 14:01 shahar1

Created a PR in Airflow's repository for un/pausing multiple DAGs support as part of its CLI: https://github.com/apache/airflow/pull/38265

It could take a while until it's merged and a bit longer until the Cloud Composer image eventually supports it. Nevertheless, it would be nice to solve the overhead issue from Cloud Composer's CLI side.

shahar1 avatar Mar 18 '24 15:03 shahar1

Created a PR in Airflow's repository for un/pausing multiple DAGs support as part of its CLI: apache/airflow#38265

It could take a while until it's merged and a bit longer until the Cloud Composer image eventually supports it. Nevertheless, it would be nice to solve the overhead issue from Cloud Composer's CLI side.

Thank you! Your changes have been published and I'm using them in the following PR: https://github.com/GoogleCloudPlatform/python-docs-samples/pull/13593

dkurzaj avatar Oct 16 '25 17:10 dkurzaj