elyra
elyra copied to clipboard
The new 'Airflow Pipeline Editor' feature is not working properly. DAGs created with this function will be recognized as errors by 'Airflow'.
Describe the issue Airflow Pipeline Generator generates malformed DAG files.
To Reproduce Steps to reproduce the behavior:
- Go to 'Airflow pipeline Editor on Launcher'
- Drag 'BashOperator on template'
- Fill bash_command on Node properties 'echo hello-world'
- Click Run pipeline on Apache Airflow
- Fill 'Pipeline Name' to 'hello-world'
Screenshots or log output If applicable, add screenshots or log output to help explain your problem.
Broken DAG: [/opt/airflow/dags/repo/hello-world-0909015623.py] Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/decorators.py", line 94, in wrapper result = func(*args, **kwargs) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/baseoperator.py", line 414, in __init__ "arguments were:\n**kwargs: {k}".format(c=self.__class__.__name__, k=kwargs, t=task_id), airflow.exceptions.AirflowException: Invalid arguments were passed to BashOperator (task_id: BashOperator). Invalid arguments were: **kwargs: {'namespace': 'airflow', 'xcom_push': False, 'inputs': [], 'outputs': [], 'secrets': [Secret(env, AWS_ACCESS_KEY_ID, elyra, AWS_ACCESS_KEY_ID), Secret(env, AWS_SECRET_ACCESS_KEY, elyra, AWS_SECRET_ACCESS_KEY)], 'in_cluster': True, 'config_file': 'None'}
Expected behavior The hello-world DAG performs the 'echo hello' behavior.
Deployment information Describe what you've deployed and how:
- Elyra version: 3.0.1
- Operating system: windows 10
- Installation source: Pip install
- Deployment type: EKS(Kubernetes v 1.21)
Pipeline runtime environment If the issue is related to pipeline execution, identify the environment where the pipeline is executed
- Apache Airflow Version 2.0.2
The same goes for EmailOperators. I checked because the generated dags passing arguments are based on k8soperator.
Unfortunately Airflow 2.x is not yet supported. https://elyra.readthedocs.io/en/stable/recipes/configure-airflow-as-a-runtime.html
@nanaones unsure if airflow package catalog connector was a feature in 2021 already, it is now ... still, even on package catalog connector initial setup and import, i.e. from
https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl
the message is as follows in Elyra:
I 2024-01-05 10:52:38.508 ElyraApp] Analysis of 'https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl' completed. Located 9 operator classes in 4 Python scripts.
[W 2024-01-05 10:52:38.521 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...
package catalog connector code needs fixes, will make a PR according to a fork by @ianonavy for package catalog connector, so that bash operator and in general the provided base airflow operators work
work done in fork outside community elyra but never tested and discussed for far here
should actually be ok for the most part, but BashOperator does not show up.
@lresende @romeokienzler I got an error message when importing the package catalog connector airflow wheel file leading to non-import of two operators (one of them bash ...) ... @nanaones I will investigate and make an additional PR for https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/package_catalog_connector/airflow_package_catalog_connector.py#L191 to bring in that change from @@ianonavy fork. That'll fix this issue I am quite sure. BaseOperator reference needs to be compatible with Airflow 2.x.
https://airflow.apache.org/docs/apache-airflow/2.6.2/operators-and-hooks-ref.html
after I do the changes, I get a different log on Elyra start when evaluating the wheel file, looking much better, more operator classes (16 instead of 9) detected.
[I 2024-01-12 22:25:00.524 ElyraApp] Analysis of ''https://archive.apache.org/dist/airflow/2.6.2/apache_airflow-2.6.2-py3-none-any.whl'' completed. Located 16 operator classes in 11 Python scripts.
[W 2024-01-12 22:25:00.568 ServerApp] Operator 'BaseBranchOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/branch.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.571 ServerApp] Operator 'EmptyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/empty.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.587 ServerApp] Operator 'BranchPythonOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/python.py'}' does not have an __init__ function. Skipping...
[W 2024-01-12 22:25:00.592 ServerApp] Operator 'LatestOnlyOperator' associated with identifier '{'airflow_package': 'apache_airflow-2.6.2-py3-none-any.whl', 'file': 'airflow/operators/latest_only.py'}' does not have an __init__ function. Skipping...
let's check the GUI:
I can now for example see and use the BashOperator, with Airflow 2.x.
Looking good, now need to see about sensors (airflow.hooks.base, airflow.sensors.base). However, the palette listing on the left is just for operators, not sensors, so should be fine.
@MR-GOYAL @lresende @ianonavy @thesuperzapper for Airflow 2.x package catalog connector / wheel file, it is not enough to just change the BaseOperator class for correct operator import as in the linked WIP PR here. When the properties of airflow components are added, the parsing of the properties and its init fields needs to be changed, too, at https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/component_parser_airflow.py#L203, among others.
Why? Because the format changed (example bash operator) between
Airflow 1.10, for example
https://github.com/apache/airflow/blob/v1-10-stable/airflow/operators/bash_operator.py#L43 https://github.com/apache/airflow/blob/v1-10-stable/airflow/operators/bash_operator.py#L93
and Airflow current
https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L48 https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L138
Things like None handling, required properties in the operators etc. I'll make sure to cover and test this in my PR. I found out, for example, that the cwd field of BashOperator currently fillls in empty strings when no textbox input in Elyra node editor for that property is done, when in fact it should be None ...
raise AirflowException(f"Can not find the cwd: {self.cwd}")
airflow.exceptions.AirflowException: Can not find the cwd:
https://github.com/apache/airflow/blob/main/airflow/operators/bash.py#L216
getting the general direction, WIP, for example, with the changes from PR 3167 and the recognition fix already in the PR linked to this issue here, BashOperator gets executed correctly, properties assembly and parsing needs some more work, though, as hinted toward with cwd property of BashOperator.