airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Mapped, classic operator tasks within TaskGroups prepend `group_id` in Graph View

Open josh-fell opened this issue 2 years ago • 2 comments

Apache Airflow version

main (development)

What happened

When mapped, classic operator tasks exist within TaskGroups, the group_id of the TaskGroup is prepended to the displayed task_id in the Graph View.

In the below screenshot, all displayed task IDs only contain the direct task_id except for the "mapped_classic_task". This particular task is a mapped BashOperator task. The prepended group_id does not appear for unmapped, classic operator tasks, nor mapped and unmapped TaskFlow tasks.

image

What you think should happen instead

The pattern of the displayed task names should be consistent for all task types (mapped/unmapped, classic operators/TaskFlow functions). Additionally, having the group_id prepended to the mapped, classic operator tasks is a little redundant and less readable.

How to reproduce

  1. Use an example DAG of the following:
from pendulum import datetime

from airflow.decorators import dag, task, task_group
from airflow.operators.bash import BashOperator


@dag(start_date=datetime(2022, 1, 1), schedule_interval=None)
def task_group_task_graph():
    @task_group
    def my_task_group():
        BashOperator(task_id="not_mapped_classic_task", bash_command="echo")
        BashOperator.partial(task_id="mapped_classic_task").expand(
            bash_command=["echo", "echo hello", "echo world"]
        )

        @task
        def another_task(input=None):
            ...

        another_task.override(task_id="not_mapped_taskflow_task")()
        another_task.override(task_id="mapped_taskflow_task").expand(input=[1, 2, 3])

    my_task_group()


_ = task_group_task_graph()
  1. Navigate to the Graph view
  2. Notice that the task_id for the "mapped_classic_task" prepends the TaskGroup group_id of "my_task_group" while the other tasks in the TaskGroup do not.

Operating System

Debian GNU/Linux

Versions of Apache Airflow Providers

N/A

Deployment

Other

Deployment details

Breeze

Anything else

Setting prefix_group_id=False for the TaskGroup does remove the prepended group_id from the tasks display name.

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

Code of Conduct

josh-fell avatar Aug 04 '22 04:08 josh-fell

I take it back, I'd be happy to submit a PR.

However, I did a cursory check trying to find where this is coming from, but it wasn't apparent for me. If someone would provide some pointers, I'd be forever in your debt.

josh-fell avatar Aug 04 '22 16:08 josh-fell

Hello @josh-fell,

For display purpose I think none of them should have the group_id prefix. (We already have it in the surrounding element).

I think the graph is drawn with d3 in airflow/www/static/js/graph.js targeting the graph-svg element.

There might be 2 different issues:

  • Labels are taken from the global nodes object (label props). They are passed to the template and constructed in task_group_to_dict function (going down to taskmixin.py label property, taking care of removing the task_group_id for the label). The final label is pushed to the global 'nodes' object. This props is also updated for mapped task on the client side, see line 101 graph.js (using the task id), to add the mapped task count. (maybe we should use the label here to be consistent (groups are removed)).

  • The id for the mapped_taskflow_task seems to be incorrect, the group_id prefix is missing, this might be a more specific problem to the taskflow decorators when mapped. (maybe this is the expected behavior). The label property is derived from the id, by truncating the number of characters of the gourp id. For taskflow mapped task, we remove the group_id from the task id to get the label, but the group id was not here in the first place, we end up with a truncated version of the id, something like w_task) Picture of what is in the database, group_id is missing in the mapped taskflow ids: image

I am not familiar with this part of the code. (mapped tasked and @task implementation). get_unique_task_id seems interesting to look at. It is used in _expand and python_task. I hope others can help here :).

Note: I tried without overriding the task_id, the mapped taskflow id is still missing the group_id prefix.

I hope this helps :)

pierrejeambrun avatar Aug 04 '22 19:08 pierrejeambrun