airflow-site icon indicating copy to clipboard operation
airflow-site copied to clipboard

Adding information about Open Data Discovery (ODD) integration

Open RamanDamayeu opened this issue 1 year ago • 3 comments

Adding information about Open Data Discovery (ODD) to the list of "Tools integrating with Airflow". It leverages Listeners capabilities of Airflow. Implemented with https://github.com/opendatadiscovery/odd-airflow-2

RamanDamayeu avatar Aug 29 '23 18:08 RamanDamayeu

Could you change it with the link to the Github project? the link you proposed does not refer to airflow in any way and it's unclear why it would be mentioned on Airflow's ecosystem page.

potiuk avatar Aug 29 '23 19:08 potiuk

Thanks, very valid point!

Of course, I left a link to the implementation of ODD integration with Airflow in a comment, but I agree that it will not be very easy for users to understand how exactly this platform is connected with Airflow.

Please advise how best to proceed here. The bottom line is that the ODD application implies several components, the main of which is the so-called platform itself. Here is a link to a repository with it.

Also for various integrations with the platform (that is, in fact, the components that are mediators between external systems and the platform itself) are implemented as collectors (if we want to implement the pull approach, that is, the components themselves go to the systems to collect meta-information with some periodicity, for example, there are such collectors for a set of databases implemented here, or for collecting metadata from some of AWS services, there is own for GCP, for Azure, etc. with their own repositories) or as adapters, if we want to expand the functionality of some external system and implement the push approach (here just like for the Airflow, this is a repository for versions up to 1.10.15 and for versions >= 2.5.1 here is a new rep (uses Listeners for integration), the same approach is used to integrate Great Expectations, dbt, Spark). The push/pull approach is described here.

An overview of the architecture we could find here: https://docs.opendatadiscovery.org/architecture

Having said all this, as you say, taking into account that there is a whole set of repositories - and for integration with Airflow we need to use at least two: the platform itself and the adapter - as we need to leave a link on the page should I change it to the adapter for Airflow versions >= 2.5.1, to a platform or maybe to be generally simple to the git organization of ODD?

RamanDamayeu avatar Aug 30 '23 10:08 RamanDamayeu

I think it's best to link to one of your GitHub repos and have a readme there explaining how to integrate airflow.

potiuk avatar Aug 30 '23 12:08 potiuk