elyra icon indicating copy to clipboard operation
elyra copied to clipboard

The new 'Airflow Pipeline Editor' feature is not working properly. DAGs created with this function will be recognized as errors by 'Airflow'.

Open nanaones opened this issue 3 years ago • 3 comments

Describe the issue Airflow Pipeline Generator generates malformed DAG files.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'Airflow pipeline Editor on Launcher'
  2. Drag 'BashOperator on template'
  3. Fill bash_command on Node properties 'echo hello-world'
  4. Click Run pipeline on Apache Airflow
  5. Fill 'Pipeline Name' to 'hello-world'

Screenshots or log output If applicable, add screenshots or log output to help explain your problem.

Broken DAG: [/opt/airflow/dags/repo/hello-world-0909015623.py] Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/decorators.py", line 94, in wrapper result = func(*args, **kwargs) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/baseoperator.py", line 414, in __init__ "arguments were:\n**kwargs: {k}".format(c=self.__class__.__name__, k=kwargs, t=task_id), airflow.exceptions.AirflowException: Invalid arguments were passed to BashOperator (task_id: BashOperator). Invalid arguments were: **kwargs: {'namespace': 'airflow', 'xcom_push': False, 'inputs': [], 'outputs': [], 'secrets': [Secret(env, AWS_ACCESS_KEY_ID, elyra, AWS_ACCESS_KEY_ID), Secret(env, AWS_SECRET_ACCESS_KEY, elyra, AWS_SECRET_ACCESS_KEY)], 'in_cluster': True, 'config_file': 'None'}

Expected behavior The hello-world DAG performs the 'echo hello' behavior.

Deployment information Describe what you've deployed and how:

  • Elyra version: 3.0.1
  • Operating system: windows 10
  • Installation source: Pip install
  • Deployment type: EKS(Kubernetes v 1.21)

Pipeline runtime environment If the issue is related to pipeline execution, identify the environment where the pipeline is executed

  • Apache Airflow Version 2.0.2

The same goes for EmailOperators. I checked because the generated dags passing arguments are based on k8soperator.

nanaones avatar Sep 10 '21 00:09 nanaones

Unfortunately Airflow 2.x is not yet supported. https://elyra.readthedocs.io/en/stable/recipes/configure-airflow-as-a-runtime.html

ptitzler avatar Sep 11 '21 17:09 ptitzler

@nanaones unsure if airflow package catalog connector was a feature in 2021 already, it is now ... still, even on package catalog connector initial setup and import, i.e. from

https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl

the message is as follows in Elyra:

I 2024-01-05 10:52:38.508 ElyraApp] Analysis of 'https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl' completed. Located 9 operator classes in 4 Python scripts.
[W 2024-01-05 10:52:38.521 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...

package catalog connector code needs fixes, will make a PR according to a fork by @ianonavy for package catalog connector, so that bash operator and in general the provided base airflow operators work

work done in fork outside community elyra but never tested and discussed for far here

should actually be ok for the most part, but BashOperator does not show up.

Bildschirmfoto 2024-01-12 um 17 48 41

@lresende @romeokienzler I got an error message when importing the package catalog connector airflow wheel file leading to non-import of two operators (one of them bash ...) ... @nanaones I will investigate and make an additional PR for https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/package_catalog_connector/airflow_package_catalog_connector.py#L191 to bring in that change from @@ianonavy fork. That'll fix this issue I am quite sure. BaseOperator reference needs to be compatible with Airflow 2.x.

https://airflow.apache.org/docs/apache-airflow/2.6.2/operators-and-hooks-ref.html

after I do the changes, I get a different log on Elyra start when evaluating the wheel file, looking much better, more operator classes (16 instead of 9) detected.

[I 2024-01-12 22:25:00.524 ElyraApp] Analysis of ''https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl'' completed. Located 16 operator classes in 11 Python scripts.
[W 2024-01-12 22:25:00.568 ServerApp] Operator 'BaseBranchOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/branch.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.571 ServerApp] Operator 'EmptyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/empty.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.587 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.592 ServerApp] Operator 'LatestOnlyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/latest_only.py'}' does not have an __init__ function. Skipping...

let's check the GUI:

I can now for example see and use the BashOperator, with Airflow 2.x.

Bildschirmfoto 2024-01-12 um 23 31 57

Looking good, now need to see about sensors (airflow.hooks.base, airflow.sensors.base). However, the palette listing on the left is just for operators, not sensors, so should be fine.

shalberd avatar Jan 12 '24 19:01 shalberd

@MR-GOYAL @lresende @ianonavy @thesuperzapper for Airflow 2.x package catalog connector / wheel file, it is not enough to just change the BaseOperator class for correct operator import as in the linked WIP PR here. When the properties of airflow components are added, the parsing of the properties and its init fields needs to be changed, too, at https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/component_parser_airflow.py#L203, among others.

Why? Because the format changed (example bash operator) between

Airflow 1.10, for example

https://github.com/apache/airflow/blob/v1-10-stable/airflow/operators/bash_operator.py#L43 https://github.com/apache/airflow/blob/v1-10-stable/airflow/operators/bash_operator.py#L93

and Airflow current

https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L48 https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L138

Things like None handling, required properties in the operators etc. I'll make sure to cover and test this in my PR. I found out, for example, that the cwd field of BashOperator currently fillls in empty strings when no textbox input in Elyra node editor for that property is done, when in fact it should be None ...

raise AirflowException(f"Can not find the cwd: {self.cwd}")
airflow.exceptions.AirflowException: Can not find the cwd:

https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L216

getting the general direction, WIP, for example, with the changes from PR 3167 and the recognition fix already in the PR linked to this issue here, BashOperator gets executed correctly, properties assembly and parsing needs some more work, though, as hinted toward with cwd property of BashOperator.

Bildschirmfoto 2024-01-16 um 19 18 22

Bildschirmfoto 2024-01-16 um 19 21 25

shalberd avatar Jan 17 '24 08:01 shalberd