airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Make Task Instance primary key be a UUID

Open kaxil opened this issue 1 year ago • 0 comments

As part of AIP-72, we want to pass the Task Instance to the worker. Currently, the primary key of TI is a combination of dag_id, task_id, run_id, map_index.

https://github.com/apache/airflow/blob/b4269f33c7151e6d61e07333003ec1e219285b07/airflow/models/taskinstance.py#L1815-L1819

Instead of sending the entire key from the executor to worker via API-server, ideally the API server can just send over a TI UUID and the worker then uses it to fetch the correct TI to execute.

We want to add a single column pk of a UUID, and should use UUID v7 (as it has better temporal sorting behaviours than the random v4). For the migration to update existing rows we can use v4 which most DBs have natively.

The scope of this GitHub issue is to add UUIDs -- but not use it anywhere in the codebase yet until we need it on the Task Execution API server. We will keep the "denormalized" columns of dag_id and run_id for easier searching/querying.

kaxil avatar Oct 18 '24 13:10 kaxil