airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Add dag run_id to Audit Log

Open bbovenzi opened this issue 1 year ago • 2 comments

Description

In event logs, we keep track of dag_id and task_id when relevant. But any task action is happening to a task instance but it is hard to tell which task instance it was. Sometimes, we use execution_date but it would be better for us to switch to run_id.

Use case/motivation

With a run_id field, We could filter the audit log and see only events that happened in a single dag run. We could link directly to the task instance in question. Right now we can only link to the general dag.

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

bbovenzi avatar Feb 21 '24 16:02 bbovenzi

👋 Hey Brent, this is a good idea.

Feel free to assign to me if you want a hand, I've got some capacity in the next week or two.

SamWheating avatar Feb 21 '24 23:02 SamWheating

Thanks @SamWheating. We migrated most things from execution_date to run_id but we seemingly forgot this one. So hopefully we can look up those PRs too for help. But doing this fully could require a big migration of the audit logs table.

bbovenzi avatar Feb 22 '24 00:02 bbovenzi

But doing this fully could require a big migration of the audit logs table.

I guess that depends, should we add the empty column and then start populating it moving forwards, or should we aim to repopulate the run_id for previous Log rows based on a join with the DagRun table (similar to this)

Not really sure what the standard is here, and this migration might be pretty hefty due to the potentially high volume of the Log table.

Thoughts, @bbovenzi ?

SamWheating avatar Feb 26 '24 18:02 SamWheating

But doing this fully could require a big migration of the audit logs table.

I guess that depends, should we add the empty column and then start populating it moving forwards, or should we aim to repopulate the run_id for previous Log rows based on a join with the DagRun table (similar to this)

Not really sure what the standard is here, and this migration might be pretty hefty due to the potentially high volume of the Log table.

Thoughts, @bbovenzi ?

I'm not opposed to only populating it moving forward. Going through old logs and translating execution_date to run_id will be a heavy lift. Currently, we do not record execution_date often enough, so many logs are already lacking information.

bbovenzi avatar Feb 26 '24 20:02 bbovenzi