airflow icon indicating copy to clipboard operation
airflow copied to clipboard

AIP-84 | Public list tags API

Open jason810496 opened this issue 1 year ago • 17 comments

closes: #42713

Migrate from https://github.com/apache/airflow//blob/main/airflow/www/views.py#L1035-L1038

jason810496 avatar Oct 12 '24 04:10 jason810496

Hi, this PR should be marked with the area:API label instead of area:UI. The bot automatically marked it with area:UI because the pre-commit process auto-generates the frontend code using OpenAPI. Thanks!

jason810496 avatar Oct 12 '24 06:10 jason810496

Thanks for the contribution! w/o having a full review I assume you should mark the legacy API as being migrated like e.g. in https://github.com/apache/airflow/pull/42798/files#diff-df829fc289eeca0e932974a5110a8f2c21b1962e87db7c3a3b44dacf084f54a1R47 ?

jscheffl avatar Oct 12 '24 09:10 jscheffl

Hi @jscheffl, since this API is migrated from the home view in airflow/www/views.py, and dag_tags is just a small part of the home view, I'm not sure if this is the only endpoint that needs to be migrated from the home view. @bbovenzi, do you have any insights on this?

jason810496 avatar Oct 12 '24 11:10 jason810496

Hi @jscheffl, since this API is migrated from the home view in airflow/www/views.py, and dag_tags is just a small part of the home view, I'm not sure if this is the only endpoint that needs to be migrated from the home view. @bbovenzi, do you have any insights on this?

Okay, then if this is not a migration we should clarify if this really should get to be a public API. I'd think if it is used only in one place in the new UI then we should not make it a new public API. As there are other public APIs to retrieve the sme information (https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag_details)

jscheffl avatar Oct 13 '24 10:10 jscheffl

Sure, that makes sense. I’ll go ahead and move this endpoint to the UI folder.

jason810496 avatar Oct 13 '24 11:10 jason810496

Okay, then if this is not a migration we should clarify if this really should get to be a public API. I'd think if it is used only in one place in the new UI then we should not make it a new public API. As there are other public APIs to retrieve the sme information (https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag_details)

I think this is a good addition to the public API because currently there is no way for a user to be able to retrieve all the tags available for filtering for instance. This was done in the legacy UI through custom views.

Meaning that users of the API would have to know in advance tags to effectively filter on them.

pierrejeambrun avatar Oct 14 '24 13:10 pierrejeambrun

Based on all the above input, I believe the original intention of issue https://github.com/apache/airflow/issues/42713 is to provide an endpoint to retrieve all tags across all DAGs for advanced filtering functionalities. I agree with the last comment by @pierrejeambrun, so this endpoint should remain in the public folder. However, I think the response schema should be:

{
  "tags": ["tag_1", "tag_2"],
  "total_entries": 2
}

Additionally, the endpoint should support offset and limit query parameters, as mentioned by @rawwar in the issue. This way, the get_dag_tags endpoint can use the same query parameters and a similar response schema as other public endpoints , as well as paginated_select on the backend.

Please correct me if I have misunderstood anything. Thanks !

jason810496 avatar Oct 14 '24 17:10 jason810496

I think the response could be:

{
  "tags": [{"tag_name: "tag_1", "dag_id": "some_dag_id"}, {"tag_name": "tag_2", "dag_id": "some_dag_id"}],
  "total_entries": 2
}

Basically we return serialized DagTag in the response field.

pierrejeambrun avatar Oct 15 '24 08:10 pierrejeambrun

I think the response could be:

{
  "tags": [{"tag_name: "tag_1", "dag_id": "some_dag_id"}, {"tag_name": "tag_2", "dag_id": "some_dag_id"}],
  "total_entries": 2
}

Basically we return serialized DagTag in the response field.

Is dag_id actually valuable information though? The whole point is that multiple dags share tags, so only showing a single dag_id doesn't help anyone. I actually prefer the list of tag_name strings.

bbovenzi avatar Oct 15 '24 12:10 bbovenzi

I’ll check the db modelling tomorrow, something I might have missesed. I suggested that dag_id was there because that’s how the sqlalchemy model is defined.

But indeed from a business point of view this does not make sense.

But on the other hand the name is the primary key and unique, so I guess there is something I need to double check.

pierrejeambrun avatar Oct 15 '24 21:10 pierrejeambrun

Update:

The endpoint is placed in the public folder and supports the following query parameters:

  • tag_name_pattern: _SearchParam
  • order_by: _OrderByParam
  • limit: QueryLimit
  • offset: QueryOffset

Example of the response schema:

{
    "tags": ["tag_1", "tag_2"],
    "total_entries": 3
}

For the SQL statement, the name field should still include DISTINCT since there may be rows like:

same_tag_name, dag_id_1
same_tag_name, dag_id_2

While the name field with the dag_id field creates a unique pair, the name field itself may have duplicates in the query results.

Feature: _OrderByParam - New query parameter for ordering by a field.

  • order_by: asc, desc
  • Ordering by a single field (This could be useful for models with only a few fields, where typically sorting by a single field is sufficient.)

jason810496 avatar Oct 16 '24 12:10 jason810496

You will have to rebase and fix conflicts as we change how fastapi_api folder is organized here: https://github.com/apache/airflow/pull/43062

kaxil avatar Oct 16 '24 13:10 kaxil

Update:

Hi @pierrejeambrun, the group_by resolved the issue. Thanks!

Refactor: SortParm

  • Moved attr_mapping to the __init__ parameter (since SortParm is used by different endpoints).
  • Consolidated allow_attr with attr_mapping (we can use attr_mapping to validate the query parameters).
  • Added default_attr (for models with multiple primary keys, like DagTag, the original get_primary_key_column might not be the column we want to sort by. With default_attr, we can specify the default field for sorting).

jason810496 avatar Oct 18 '24 15:10 jason810496

Thanks @jason810496, Do you mind putting the refactoring part into its own PR so it does not block this one.

pierrejeambrun avatar Oct 18 '24 15:10 pierrejeambrun

Thanks @jason810496, Do you mind putting the refactoring part into its own PR so it does not block this one.

Sure, so I should create a new branch on top of the current commit and rebase the current branch. After the current PR is merged, I will create a PR for refactoring SortParm from the new branch. Is that correct?

jason810496 avatar Oct 18 '24 16:10 jason810496

Yes, you can checkout another branch from here (full with refactoring). Then in this branch you can drop them (the refactoring part)

pierrejeambrun avatar Oct 18 '24 16:10 pierrejeambrun

Resolved! I have removed the refactoring part.

jason810496 avatar Oct 18 '24 16:10 jason810496

Unrelated CI failure, merging.

Thanks @jason810496

pierrejeambrun avatar Oct 21 '24 09:10 pierrejeambrun