Build ERD to be searchable in the browser
What do you see as an issue?
The current ERD is a static image, and elements can't be searched using ctrl/cmd+f.
Solving the problem
We can use PlantUML to build the ERD as an SVG which supports search.
The only downside is the need to include a jar file to enable PlantUML with sphinx. I want to make sure this is okay before proceeding with a PR.
Anything else
Example screenshot of the search functionality:
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Ideally, we should make it part of the Breeze CI image (so add a step in the image building process alongside other system dependencies likely in https://github.com/apache/airflow/blob/main/scripts/docker/install_os_dependencies.sh#L69 ) - documentation building happens inside the image, so having it there is the way how you can include it.
It's a GPL licence as I checked - so we cannot redistribute it but we can use it as part of our build tooling:
https://www.apache.org/legal/resolved.html#prohibited
They May Not Be Distributed
Apache projects may not distribute Category X licensed components, in source or binary form; in ASF source code or in convenience binaries. As with the previous question on platforms, you can rely on the component if its license terms do not affect the Apache product's licensing. For example, using a GPL'ed tool during the build is okay, but including GPL'ed source code is not.
So there is fundamentally nothing "blocking" it.
Hi @potiuk, I would like to explore this and try to generate a PlantUML-based ERD. Please can you assign this to me as my first issue?
Assigned/
Hi @RNHTTR, @potiuk,
I started working on this PR after my holidays only to realize that the generated image is already available in an SVG format (you see that when you right-click and open the image in a new tab).
I checked how the diagram is generated and it's quite straightforward. On every pre-commit, we have a function update_er_diagram.py that calls run_prepare_er_diagram.py and is responsible for generating the diagram. Under the hood, it uses a library eralchemy2 for rendering the diagram.
Now, am I correct in understanding that this is rather a frontend/docs fix and does not require any work with plantuml?
PS - The eralchemy2 library is deprecated, there have been no updates in the last 6 months.
Using plantuml would make the ERD searchable (that is, using ctrl+ / cmd+f) in the Airflow docs html without needing to open the SVG file. Whether it's worth it to update is another question...
Using
plantumlwould make the ERD searchable (that is, using ctrl+ / cmd+f) in the Airflow docs html without needing to open the SVG file. Whether it's worth it to update is another question...
I think it's worth while - especially if it might get rid of some dependencies. Eralchemy2 brings some problematic dependencies (grpahviz) - which is notoriosly difficult to install on various OS's (MacOS is one). PlantUML seems to have few possible engines - and Graphviz is only one of them, maybe other engines are easier to get "working" on more platforms - that would be even better benefit of switching.
Closing in favor of #46057