airflow
airflow copied to clipboard
Mapped, classic operator tasks within TaskGroups prepend `group_id` in Graph View
Apache Airflow version
main (development)
What happened
When mapped, classic operator tasks exist within TaskGroups, the group_id
of the TaskGroup is prepended to the displayed task_id
in the Graph View.
In the below screenshot, all displayed task IDs only contain the direct task_id
except for the "mapped_classic_task". This particular task is a mapped BashOperator
task. The prepended group_id
does not appear for unmapped, classic operator tasks, nor mapped and unmapped TaskFlow tasks.
data:image/s3,"s3://crabby-images/3700c/3700c430971f3535a41ece8b3f62d4431c79142f" alt="image"
What you think should happen instead
The pattern of the displayed task names should be consistent for all task types (mapped/unmapped, classic operators/TaskFlow functions). Additionally, having the group_id
prepended to the mapped, classic operator tasks is a little redundant and less readable.
How to reproduce
- Use an example DAG of the following:
from pendulum import datetime
from airflow.decorators import dag, task, task_group
from airflow.operators.bash import BashOperator
@dag(start_date=datetime(2022, 1, 1), schedule_interval=None)
def task_group_task_graph():
@task_group
def my_task_group():
BashOperator(task_id="not_mapped_classic_task", bash_command="echo")
BashOperator.partial(task_id="mapped_classic_task").expand(
bash_command=["echo", "echo hello", "echo world"]
)
@task
def another_task(input=None):
...
another_task.override(task_id="not_mapped_taskflow_task")()
another_task.override(task_id="mapped_taskflow_task").expand(input=[1, 2, 3])
my_task_group()
_ = task_group_task_graph()
- Navigate to the Graph view
- Notice that the
task_id
for the "mapped_classic_task" prepends the TaskGroupgroup_id
of "my_task_group" while the other tasks in the TaskGroup do not.
Operating System
Debian GNU/Linux
Versions of Apache Airflow Providers
N/A
Deployment
Other
Deployment details
Breeze
Anything else
Setting prefix_group_id=False
for the TaskGroup does remove the prepended group_id
from the tasks display name.
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I take it back, I'd be happy to submit a PR.
However, I did a cursory check trying to find where this is coming from, but it wasn't apparent for me. If someone would provide some pointers, I'd be forever in your debt.
Hello @josh-fell,
For display purpose I think none of them should have the group_id prefix. (We already have it in the surrounding element).
I think the graph is drawn with d3 in airflow/www/static/js/graph.js
targeting the graph-svg
element.
There might be 2 different issues:
-
Labels are taken from the global
nodes
object (label
props). They are passed to the template and constructed intask_group_to_dict
function (going down totaskmixin.py
label property, taking care of removing the task_group_id for the label). The finallabel
is pushed to the global 'nodes' object. This props is also updated formapped task
on the client side, see line 101graph.js
(using the task id), to add the mapped task count. (maybe we should use the label here to be consistent (groups are removed)). -
The id for the
mapped_taskflow_task
seems to be incorrect, thegroup_id
prefix is missing, this might be a more specific problem to the taskflow decorators when mapped. (maybe this is the expected behavior). Thelabel
property is derived from the id, by truncating the number of characters of the gourp id. For taskflow mapped task, we remove the group_id from the task id to get the label, but the group id was not here in the first place, we end up with a truncated version of the id, something likew_task
) Picture of what is in the database, group_id is missing in the mapped taskflow ids:
I am not familiar with this part of the code. (mapped tasked and @task
implementation). get_unique_task_id
seems interesting to look at. It is used in _expand
and python_task
. I hope others can help here :).
Note: I tried without overriding the task_id, the mapped taskflow id is still missing the group_id prefix.
I hope this helps :)