AIP-84 | Public list tags API
closes: #42713
Migrate from https://github.com/apache/airflow//blob/main/airflow/www/views.py#L1035-L1038
Hi, this PR should be marked with the area:API label instead of area:UI. The bot automatically marked it with area:UI because the pre-commit process auto-generates the frontend code using OpenAPI. Thanks!
Thanks for the contribution! w/o having a full review I assume you should mark the legacy API as being migrated like e.g. in https://github.com/apache/airflow/pull/42798/files#diff-df829fc289eeca0e932974a5110a8f2c21b1962e87db7c3a3b44dacf084f54a1R47 ?
Hi @jscheffl, since this API is migrated from the home view in airflow/www/views.py, and dag_tags is just a small part of the home view, I'm not sure if this is the only endpoint that needs to be migrated from the home view. @bbovenzi, do you have any insights on this?
Hi @jscheffl, since this API is migrated from the home view in
airflow/www/views.py,anddag_tagsis just a small part of the home view, I'm not sure if this is the only endpoint that needs to be migrated from the home view. @bbovenzi, do you have any insights on this?
Okay, then if this is not a migration we should clarify if this really should get to be a public API. I'd think if it is used only in one place in the new UI then we should not make it a new public API. As there are other public APIs to retrieve the sme information (https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag_details)
Sure, that makes sense. I’ll go ahead and move this endpoint to the UI folder.
Okay, then if this is not a migration we should clarify if this really should get to be a public API. I'd think if it is used only in one place in the new UI then we should not make it a new public API. As there are other public APIs to retrieve the sme information (https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag_details)
I think this is a good addition to the public API because currently there is no way for a user to be able to retrieve all the tags available for filtering for instance. This was done in the legacy UI through custom views.
Meaning that users of the API would have to know in advance tags to effectively filter on them.
Based on all the above input, I believe the original intention of issue https://github.com/apache/airflow/issues/42713 is to provide an endpoint to retrieve all tags across all DAGs for advanced filtering functionalities. I agree with the last comment by @pierrejeambrun, so this endpoint should remain in the public folder. However, I think the response schema should be:
{
"tags": ["tag_1", "tag_2"],
"total_entries": 2
}
Additionally, the endpoint should support offset and limit query parameters, as mentioned by @rawwar in the issue. This way, the get_dag_tags endpoint can use the same query parameters and a similar response schema as other public endpoints , as well as paginated_select on the backend.
Please correct me if I have misunderstood anything. Thanks !
I think the response could be:
{
"tags": [{"tag_name: "tag_1", "dag_id": "some_dag_id"}, {"tag_name": "tag_2", "dag_id": "some_dag_id"}],
"total_entries": 2
}
Basically we return serialized DagTag in the response field.
I think the response could be:
{ "tags": [{"tag_name: "tag_1", "dag_id": "some_dag_id"}, {"tag_name": "tag_2", "dag_id": "some_dag_id"}], "total_entries": 2 }Basically we return serialized
DagTagin the response field.
Is dag_id actually valuable information though? The whole point is that multiple dags share tags, so only showing a single dag_id doesn't help anyone. I actually prefer the list of tag_name strings.
I’ll check the db modelling tomorrow, something I might have missesed. I suggested that dag_id was there because that’s how the sqlalchemy model is defined.
But indeed from a business point of view this does not make sense.
But on the other hand the name is the primary key and unique, so I guess there is something I need to double check.
Update:
The endpoint is placed in the public folder and supports the following query parameters:
tag_name_pattern:_SearchParamorder_by:_OrderByParamlimit:QueryLimitoffset:QueryOffset
Example of the response schema:
{
"tags": ["tag_1", "tag_2"],
"total_entries": 3
}
For the SQL statement, the name field should still include DISTINCT since there may be rows like:
same_tag_name, dag_id_1
same_tag_name, dag_id_2
While the name field with the dag_id field creates a unique pair, the name field itself may have duplicates in the query results.
Feature: _OrderByParam - New query parameter for ordering by a field.
order_by:asc,desc- Ordering by a single field (This could be useful for models with only a few fields, where typically sorting by a single field is sufficient.)
You will have to rebase and fix conflicts as we change how fastapi_api folder is organized here: https://github.com/apache/airflow/pull/43062
Update:
Hi @pierrejeambrun, the group_by resolved the issue. Thanks!
Refactor: SortParm
- Moved
attr_mappingto the__init__parameter (sinceSortParmis used by different endpoints). - Consolidated
allow_attrwithattr_mapping(we can useattr_mappingto validate the query parameters). - Added
default_attr(for models with multiple primary keys, likeDagTag, the originalget_primary_key_columnmight not be the column we want to sort by. Withdefault_attr, we can specify the default field for sorting).
Thanks @jason810496, Do you mind putting the refactoring part into its own PR so it does not block this one.
Thanks @jason810496, Do you mind putting the refactoring part into its own PR so it does not block this one.
Sure, so I should create a new branch on top of the current commit and rebase the current branch.
After the current PR is merged, I will create a PR for refactoring SortParm from the new branch.
Is that correct?
Yes, you can checkout another branch from here (full with refactoring). Then in this branch you can drop them (the refactoring part)
Resolved! I have removed the refactoring part.
Unrelated CI failure, merging.
Thanks @jason810496