airflow icon indicating copy to clipboard operation
airflow copied to clipboard

setup ui rest api

Open pierrejeambrun opened this issue 1 year ago • 4 comments

This is the initial PR for AIP-84, setting up a very basic separate FastAPI API for UI purposes. I wanted to take that incrementally instead of opening a big PR latter. Some steps will require discussions on the implementation (I see different ways of handling things), and I don't want a gigantic PR getting blocked for days/weeks.

The goal at the end of that is to have 1 object custom endpoint duplicated to the new UI REST API showing how such endpoint are developed and tested as part of the new API.

At the time of airflow 3 release, old endpoint /object will simply be deleted, so we do not care at that time about duplication. (old endpoint contribution will be limited / filtered anyway).

To test this very basic endpoint you can for now simply run airflow with breeze. Stop the webserver and start the new UI Rest API manually with fastapi dev airflow/api_ui/main.py. Then in another breeze terminal run curl localhost:8000/ui/next_run_datasets/<your_dag_id_with_datasets>

Follow up PRs in the next few days to:

  • Integrate the new UI API to the CLI (most likely under a new command airflow ui-api or directly under airflow webserver ?)
  • Integrate the new UI API to Breeze (for developer experience)
  • Tests + CI integration (new test type, CI jobs etc.)
  • Add permissions and access control
  • Contributor documentation for this new API
  • dev tools to automatically generate front-end code (typescript types + react queries) based on the API spec
  • Clean and modularize code.

Screenshot 2024-08-27 at 17 50 18

pierrejeambrun avatar Aug 27 '24 15:08 pierrejeambrun

Special tests with pydantic v1 and no pydantic will break. Can we add pydantic >2.x. to airflow 3, I think it is reasonable to make airflow require pydantic v2, WDYT ?

Should I open another PR that do that and delete the Pydantic v1 test and Pydantic removed tests as this will not be relevant anymore ?

FastAPI does not impose pydantic v2 and still works with pydantic v1 (for now), if we really want to, we can still support pydantic v1 for airflow 3 I guess.

pierrejeambrun avatar Aug 28 '24 07:08 pierrejeambrun

Yes. Just remove the special testa (no Pydantic in main (and relevant code in entrypoint_ci.sh). I think we already know that Pydantic in Airflow 3 is a must with fastapi.

When it comes to V2 vs. V1 - I'd be for removing V1 support as well. It still 6 months or so until Airflow 3 is out which will be already almost 2 years of Pydantic V2.

All our community providers that need Pydantic already support V2. There are already few dependencies that only work with Pydantic 2 (pyiceberg weaviate-client) and we can expect more dropping it.

The only problem is when someone uses their own libraries/old versions that support only V1. It's been a problem in the past - but we can expect it to be negligible in the future.

Also Airflow 3 is a bit delifferrnt beast altogether - when we do task-sdk separation and different set of dependencies (no providers) for scheduler/webserver and workers/processor/triggerer (with providers) the problem will be gone entirely - because Pydantic V2 will only be necessary for webserver/apiserver. If (as I advocate for) we find a way to not have to install providers in webserver/API server - the problem will be gone entirely - this is yet another reason why I think we should aim for webserver not having to install providers in webserver for base operator links/ connections /log handlers - this way it will not matter if any custom code or providers has conflicting dependencies with webserver or not and we could make such decision without even blinking.

potiuk avatar Aug 28 '24 08:08 potiuk

If (as I advocate for) we find a way to not have to install providers in webserver/API server - the problem will be gone entirely - this is yet another reason why I think we should aim for webserver not having to install providers in webserver for base operator links/ connections /log handlers - this way it will not matter if any custom code or providers has conflicting dependencies with webserver or not and we could make such decision without even blinking.

That would be awesome indeed! But, as mentioned by Ash at the last dev call, we need to find a solution for the auth manager. They are currently in providers and they are used by the webserver

vincbeck avatar Aug 28 '24 14:08 vincbeck

That would be awesome indeed! But, as mentioned by Ash at the last dev call, we need to find a solution for the auth manager. They are currently in providers and they are used by the webserver

It's a bit spinning-off the Pydantic discussion but yes - having a "webserver-only plugin" or smth would be the best. We could define those independently from providers - same with remote logging.

I do not think this should be a separate workstream targeted for 3.0 (it could possibly be done later) - but there is nothing wrong in installing "just" amazon provider in webserver when you need the loggging/auth - or we could potentially carve-out "Auth/Logging" out of provider to separate package.

Currently the way airflow webserver works - you need to install all providers your DAGs can use, and if you have conflicts with any of them - you have a problem. But if you only install "amazon auth" and "amazon logging" - (for example) - which might be from amazon provider or it might be from a "webserver plugin" - and we do not need to install any other provider - that changes a lot when it comes to conflicting dependencies.

If we we could just move connections and base operator links out - that would remove the need to install all the providers you want to use in your DAGs for webserver - and only install single auth/loging that is "per-installation".

potiuk avatar Aug 28 '24 16:08 potiuk

Ok thanks for the details @potiuk, the last commit remove all things related to pydantic v1 vs v2 vs none in the codebase. (dev tools, tests, utils and more).

Lets see if the CI is happy.

Maybe I should add a significant newsfragment for airflow core requiring pydantic >=v2.3 this seems important to users, especially those that relied on v1.x ?

pierrejeambrun avatar Aug 29 '24 10:08 pierrejeambrun

That looks good - but possibly separating the pydantic removals out is a good idea + there are some test failures - which do not seem related to pydantic removal (another reason why separating is likely a good idea)

potiuk avatar Aug 29 '24 12:08 potiuk

And yes - since pydantic is quite popular and has potential for conflicts- adding a one-line note/newsfragment is likely a good idea.

potiuk avatar Aug 29 '24 12:08 potiuk

Rebased on top of https://github.com/apache/airflow/pull/41857 that need to be merged first.

pierrejeambrun avatar Aug 29 '24 13:08 pierrejeambrun

https://github.com/apache/airflow/pull/41857 merged - merging this one then

potiuk avatar Aug 29 '24 16:08 potiuk

(feel freee to merge it @pierrejeambrun if ready)

potiuk avatar Aug 29 '24 16:08 potiuk

Yes ready ty

pierrejeambrun avatar Aug 29 '24 17:08 pierrejeambrun

Ah, late to the party, did not see the PR prior merge... will the new API with FastAPI always be another hosted process next to webserver? Or will FastAPI (later) mode as a new endpoint below? I fear adding a new additional process makes deployment more complex and I'd prefer to host only one web facing server endpoint. (Else I have two host names or need an ingress/application gateway to redirrect sub-path'es to the right process serving content)

jscheffl avatar Aug 29 '24 21:08 jscheffl

I think that nothing is set in stone. I went with a separate process because I think this is what was originally mentioned and it is easier to iterate and develop, but this is far from production ready and to what it will actually look like when airflow 3 comes out.

At some point that might come under the umbrella of a single process with multiple apps running as you mentioned. I know that Ash also has a new API coming for the task interface work.

I think it is up to us to decide considering other API and AIP. (I just didn’t want this one to get blocked until we find a consensus this is not specific to this AIP particularly but more generally how do we handle our multiple APIs deployment, there is also the internal-44 API too)

pierrejeambrun avatar Aug 29 '24 23:08 pierrejeambrun