airflow-provider-datarobot icon indicating copy to clipboard operation
airflow-provider-datarobot copied to clipboard

Elyra Pipelines Integration failing with wheel file

Open shalberd opened this issue 2 years ago • 4 comments

I am using Airflow 2.4.1 and would be interested, in addition to the basic airflow operators Bildschirmfoto 2023-01-06 um 17 58 43

https://airflow.apache.org/docs/apache-airflow/2.4.1/_api/airflow/operators/index.html#

to also use this provider with all its components in Elyra.

In Elyra Pipelines, there is the possibility to add operators via a concept named "Airflow Provider Package Catalog Connector".

https://medium.com/ibm-data-ai/getting-started-with-apache-airflow-operators-in-elyra-aae882f80c4a

However, when I add the wheel-file download url to the Elyra config, I get the following notice in the jupyterlab elyra container:

[E 2023-01-06 16:41:25.198 ElyraApp] Error. Airflow provider package connector 'DataRobot Operator Components for Airflow' is not configured properly. The archive '/tmp/tmpij9e8m_j/airflow_provider_datarobot-0.0.3-py3-none-any.whl' contains 0 file(s) named 'get_provider_info.py'.
[I 2023-01-06 16:41:27.003 SingleUserLabApp log:189] 200 GET /user/kube%3Aadmin/api/kernels?1673023286993 (kube:[email protected]) 1.77ms 

@ptitzler @kiersten-stokes

Can you maybe provide the developer with a hint as to what would need to be changed for integration to work?

It would be great to have all those operators available via Elyra Pipelines.

https://pypi.org/project/airflow-provider-datarobot/#description

shalberd avatar Jan 06 '23 16:01 shalberd

Then again, it is possible this does not yet work with providers made for Airflow 2.x, as DataRobot provider is, since for example the Apache Airflow Providers Amazon Wheel file, which has the file get_provider_info.py in its directories, throws yet another error in the Elyra container

[E 2023-01-06 17:10:10.314 SingleUserLabApp component_parser_airflow:56] Content associated with identifier '{'provider_package': 'apache_airflow_providers_amazon-7.0.0-py3-none-any.whl', 'provider': 'apache_airflow_providers_amazon', 'file': 'airflow/providers/amazon/aws/operators/ecs.py'}' could not be parsed: 'Attribute' object has no attribute 'id'. Skipping...

shalberd avatar Jan 06 '23 17:01 shalberd

Hi @shalberd,

There have been many Apache Airflow changes since Elyra introduced the provider package connector in version 3.6. The changes to provider package implementations are incompatible and our connector is unable to locate the operator classes in the archive. Any provider created for Airflow releases > 2.2 will therefore likely not work.

ptitzler avatar Jan 19 '23 19:01 ptitzler

@ptitzler are there any plans to change that situation in Elyra, or, in other words, how important within IBM is Airflow still at this point?

shalberd avatar May 04 '23 09:05 shalberd

@shalberd I am no longer involved with the Elyra project and therefore don't have any insights into future plans. Please reach out to the remaining maintainers via the project's public channels.

ptitzler avatar May 04 '23 14:05 ptitzler

Hi @shalberd - I work at DataRobot and we're taking a renewed interest in this repository/package and have actually just released a new version.

We plan on having a relatively frequent cadence of releasing changes in the near future.

Could you possibly attempt again with the latest version as of now 0.1.0 (released Feb. 17th, 2025) and let us know if we can be of any assistance to your reported issue?

c-h-russell-walker avatar Feb 18 '25 14:02 c-h-russell-walker

@shalberd I'm going to close this issue but if you need or want more assistance please let us know and/or create a new issue.

Thanks!

c-h-russell-walker avatar Mar 14 '25 20:03 c-h-russell-walker

@c-h-russell-walker Thank you for letting me know about 0.1.0 I will have to do changes in Elyra Jupyterlab plugin for parsing anyways, so good to know. It will be a while until I can give feedback, though. Cool to know you keep developing that provider. I am sure, aside from triggered from within Elyra, the Airflow community appreciates that a lot, in terms of Airflow 2.8.x or Airflow 2.10.x or in general Airflow. As mentioned ... I will keep in touch and reach out this year. The aim is for validated Airflow 2.x support. And then, later this semester, I guess Airflow 3 is on the horizon, too.

shalberd avatar Mar 14 '25 20:03 shalberd

Amazing! And yes not only do we have 0.2.0 as of March 4th but we also now have an early access version that's currently releasing that latest code on Tuesdays if you want to ever check that out.

https://pypi.org/project/airflow-provider-datarobot-early-access/

c-h-russell-walker avatar Mar 14 '25 20:03 c-h-russell-walker

@c-h-russell-walker what makes my environment special from an architectural perspective: We have in use both Airflow 2.x as well as Open Data Hub Kubeflow notebooks as well as DataRobot (I believe 9). So yes, as mentioned, very neat to see Airflow support continuing in form of a custom provider.

shalberd avatar Mar 14 '25 20:03 shalberd