airflow High CPU utilisation caused by frequent updates to dag

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.9.1

What happened?

Setup We are self-hosting Airflow via Helm Chart. We use google's CloudSQL as our Airflow metadata store.

Issue

We observe that the below query has been called many times, causes database locks and is causing high CPU utilisation

UPDATE
  dag_run
SET
  last_scheduling_decision=$1::timestamptz,
  updated_at=$2::timestamptz
WHERE
  dag_run.id = $3

For instance, referring to the below screenshot, in 1 day, 193,362 times has been called.

Spec of the CLoudSQL instance

The CLoudSQL instance has 4 vCPUs and 15GB ram.

What you think should happen instead?

Updates to the dag_run on last_scheduling_decision should be reduced (or configurable?)

So we can test if database CPU utilisation can be reduced and Database locking problem can be resolved.

How to reproduce

Not easily reproducible.

==========

At the time of the UPDATE SQLs, I observe that only 5 instances of DAG runs are running.

We have a similar setup in another environment where the other CloudSQL instance also has only 4 vCPUs.

In the other environment, however, the number of times the UPDATE SQL is called is smaller. Also no huge database locks are observed in the other environment.

Operating System

linux

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Sep 06 '24 03:09 kenho811-b1

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

Sep 06 '24 03:09 boring-cyborg[bot]

I'm experiencing this issue too in 2.10.4. Looks like the index was removed because unused, but I'm clearly seeing queries that would be using this index

Dec 20 '24 15:12 jkramer-ginkgo

I'm experiencing this issue too in 2.10.4. Looks like the index was removed because unused, but I'm clearly seeing queries that would be using this index

coudl you please epxlain what your queries are you see and logs where you observe it?

Dec 20 '24 20:12 potiuk

I can share screenshots from our RDS performance insights dashboard (the red corresponds to locking, which is why I was looking in to the index in the first place)

Dec 20 '24 21:12 jkramer-ginkgo

Apologies, definitely a brain fart. This query wouldn't use the index mentioned!

Dec 20 '24 21:12 jkramer-ginkgo

This issue itself is still relevant (and what I'm observing). But the index removal is irrelevant.

Dec 20 '24 21:12 jkramer-ginkgo

High CPU utilisation caused by frequent updates to dag_run table

Apache Airflow version

If "Other Airflow 2 version" selected, which one?

What happened?

What you think should happen instead?

How to reproduce

Operating System

Versions of Apache Airflow Providers

Deployment

Deployment details

Anything else?

Are you willing to submit PR?

Code of Conduct