Make Airflow plugin fully compatible with Airflow >=2.7
The modern Airflow version has openlineage support moved under openlineage provider. According to requirements: https://github.com/datahub-project/datahub/blob/cbe0334fb002253d8f366c32eb88db57af0e6baf/metadata-ingestion-modules/airflow-plugin/setup.py#L47 Current Airflow plugin V2 rely on the old openlineage plugin. It triggers next deprecation warning line in the task logs:
For Airflow 2.7 and later, use the native Airflow Openlineage provider package. Documentation can be found at https://airflow.apache.org/docs/apache-airflow-providers-openlineage
DataHub version: 1.0.0 Airflow version: 2.9.2 Airflow plugin version: acryl-datahub-airflow-plugin==1.0.0.2
Worth adding that Airflow 3 is already out
May I take this issue and submit a PR?
@harishkesavarao go for it
Note that you might end up with complex dependency issues. I'm worried that we may need to drop support support for Airflow <2.7 in order to make this work, although that might be acceptable since those are pretty low usage at this point.
Sounds good @hsheth2. Agree with the dependency issues, and with dropping support for Airflow < 2.7. I plan on testing as extensively as possible once the change is made. Is there anything else that you recommend in addition to it?
@harishkesavarao it might make sense to do two separate PRs - one dropping support, and the other improving compat with Airflow 2.7+
Sounds good, let me open the PRs and get back. Thanks for the comment!
Sorry for the delay, will pick this up shortly. Edit: Working on it actively.
closing this since https://github.com/datahub-project/datahub/issues/13357 is merged, thanks!
@yoonhyejin hello. Looks like your comment contains link to the issue itself. Also if I understand @harishkesavarao correctly. The https://github.com/datahub-project/datahub/pull/13619 - is only the 1-st part of work to resolve this issue. There was no actual changes in the plugin codebase, only bump the minimum supported version of Airflow.
That’s correct - this is still open.
@Linux-oiD @hsheth2 @ms32035
Would the second part of the fix be to change:
"openlineage-airflow>=1.2.0,<=1.25.0" -> "openlineage-airflow>=1.7.1,<=2.5.0" here?
Or I am thinking should we could keep a more restrictive upper version limit?
@harishkesavarao the next step would be to remove the dep on openlineage-airflow and replace it with apache-airflow-providers-openlineage https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html
Thank you @hsheth2, I will work on a PR in the coming days. Are there any items to watch out for (dependencies, tests as well as the approach)?
@harishkesavarao the main thing to watch out for is to ensure that all of the tests continue to work, and should not have any changes to the metadata produced by the integration.
@harishkesavarao @hsheth2 are there still plans to roll out a change for this soon? The open lineage dependencies blocks us from migrating to Airflow 3 and we would like the datahub plugins to work.
@yuefeng-zhu, sorry about the delay.
I am working on it and will send out a PR soon. What timeline are we looking at?
On Wed, 16 Jul 2025 at 9:29 PM, yuefeng-zhu @.***> wrote:
yuefeng-zhu left a comment (datahub-project/datahub#13357) https://github.com/datahub-project/datahub/issues/13357#issuecomment-3079264663
@harishkesavarao https://github.com/harishkesavarao @hsheth2 https://github.com/hsheth2 are there still plans to roll out a change for this soon? The open lineage dependencies blocks us from migrating to Airflow 3 and we would like the datahub plugins to work.
— Reply to this email directly, view it on GitHub https://github.com/datahub-project/datahub/issues/13357#issuecomment-3079264663, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZAT6E2BVOZVL5DTNXAVNGT3IZZFLAVCNFSM6AAAAAB4DB7QZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANZZGI3DINRWGM . You are receiving this because you were mentioned.Message ID: @.***>
@yuefeng-zhu, sorry about the delay.
I am working on it and will send out a PR soon. What timeline are we looking at? …
Ideally within the next 2 weeks! There is a lot of need for Airflow 3 features, including event based capabilities. We also need Datahub for metadata purposes for our data! Thank you very much @harishkesavarao.
@harishkesavarao the next step would be to remove the dep on
openlineage-airflowand replace it withapache-airflow-providers-openlineagehttps://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html
@hsheth2 As I am implementing the changes, I want to clarify a couple items, since you have done most of the implementation for the existing plugin:
- I have changed,
"openlineage-airflow>=1.2.0,<=1.30.1"->"apache-airflow-providers-openlineage>=1.1.0,<2.5.0"in setup.py - Now, I see that the plugin is being used in: _extractors.py, datahub_listener.py. I believe these imports need to begin using the
apache-airflow-providers-openlineageplugin. Can you please confirm if this looks like the right direction?
Specifically,
from openlineage.airflow.listener import TaskHolder
from openlineage.airflow.utils import redact_with_exclusions
from openlineage.client.serde import Serde
and
from openlineage.airflow.extractors import (
BaseExtractor,
ExtractorManager as OLExtractorManager,
TaskMetadata,
)
from openlineage.airflow.extractors.snowflake_extractor import SnowflakeExtractor
from openlineage.airflow.extractors.sql_extractor import SqlExtractor
from openlineage.airflow.utils import get_operator_class, try_import_from_string
from openlineage.client.facet import (
ExtractionError,
ExtractionErrorRunFacet,
SqlJobFacet,
)
Yup that looks about right. The other thing to be careful of is to make sure everything ports over properly and all the tests continue to work as expected.
@hsheth2 a question on the datahub_listener, I may be completely off track here, but looking at the apache-airflow-providers-openlineage plugin, my understanding is that we do not have to keep track of tasks within the datahub_listener.
I am looking at the apache_airflow_providers_openlineage-2.5.0 code and it looks like we can just call the methods within the OpenLineageListener class.
@yuefeng-zhu just letting you know that I am working through this and will try and make as much progress as possible at the earliest.
Not quite - our listener class is very similar to the OpenLineageListener class, but does a bit more. If you look through the code you'll see the differences (different extractor, format messages for DataHub instead of OL, etc.). Those differences / additional capabilities are important to preserve, but we should be integrating the other OL changes into our listener. However, you are probably correct that we can remove the TaskHolder.
@harishkesavarao @hsheth2 do we have an ETA on this change?
@yuefeng-zhu, not at the moment. I am actively working on it. The changes are quite extensive, given we are handling the listener for Airflow 3 and beyond but also preserving backward compatibility. I understand that you are waiting for the change and I will keep you posted.
You can follow the updates here: https://github.com/harishkesavarao/datahub/blob/fix/airflow-plugin/metadata-ingestion-modules/airflow-plugin/src/datahub_airflow_plugin/datahub_listener.py
On Sat, 9 Aug 2025 at 12:42 AM, yuefeng-zhu @.***> wrote:
yuefeng-zhu left a comment (datahub-project/datahub#13357) https://github.com/datahub-project/datahub/issues/13357#issuecomment-3169050691
@harishkesavarao https://github.com/harishkesavarao @hsheth2 https://github.com/hsheth2 do we have an ETA on this change?
— Reply to this email directly, view it on GitHub https://github.com/datahub-project/datahub/issues/13357#issuecomment-3169050691, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZAT6E5DKRM76APD2AVYNAT3MTZCBAVCNFSM6AAAAAB4DB7QZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNRZGA2TANRZGE . You are receiving this because you were mentioned.Message ID: @.***>
@yuefeng-zhu I will try my best to submit a PR by the weekend.
@hsheth2 I believe the conversion to the Airflow plugin for openlineage presents an opportunity to use the Airflow extractors as well. The Airflow extractor manager does not have child classes for Snowflake, Bigquery or any custom extractor.
What are your thoughts on implementing them in the datahub extractor?
Options:
- Continue to use the OL extractors for specific operators while moving away from OL to Airflow extractors for generic use.
- Write wrappers for Snowflake and Bigquery to allow custom extractors to use the Airflow extractors (this is a larger effort, potentially the long term option to completely move from the OL extractor)
Let's keep it simple - we can continue inheriting from and extending openlineage's ExtractorManager
All of our imports should probably change from openlineage.airflow.<something> to airflow.providers.openlineage.<something>
@yuefeng-zhu I will continue to work on this and will finalize the PR. Have been having a time crunch personally, apologies.
I'm eager to get this patch. Let me know if I can help in any way.
@sorenarchibald, thank you for offering. I would appreciate that. Especially, I need some help with testing the refactoring of the extractors to use this instead of this.
@sorenarchibald @yuefeng-zhu @ne1r0n I am working on this actively and making progress, just wanted to give an update.