airflow
airflow copied to clipboard
Add indexes on dag_id column in refencing tables to speed up deletion of dag records
When the dag records count gets huge, and users try to delete DAG and DAG runs that are no longer needed or are stale, it is observed that the deletion is significantly slow. The reason for this is that the CASCADING DELETES are slow. Although, we have foreign key constraints in the referencing tables, they do not create an index implicitly on those columns (dag_id in the referencing tables in this case). Hence, we're creating indexes on the 6 referencing table for CASCADE DELETES to speed up the deletion of records. Without these indexes, it was observed that it takes many hours to delete those records and it reduced to a few seconds after adding those indexes.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst
or {issue_number}.significant.rst
, in newsfragments.
Seems like for MySQL indexes for FK already exists
Seems like for MySQL indexes for FK already exists
Wondering if I should put these operations under
if conn.dialect.name != "mysql":
create_index....
or
if conn.dialect.name in ("sqlite", "postgresql"):
create_index....
WDYT is better?
Closing in favour of https://github.com/apache/airflow/pull/39638