kedro icon indicating copy to clipboard operation
kedro copied to clipboard

Permission denied: 'git' is raised when running a node in apache/airflow Docker image

Open vilozio opened this issue 2 years ago • 0 comments

Description

An Airflow DAG created from Kedro pipeline using Kedro-Airflow plugin is running inside Kubernetes, using official apache/airflow image, fails with an error PermissionError: [Errno 13] Permission denied: 'git'.

Context

I'd like to run a Kedro pipeline in Airflow. I've used Kedro-Airflow plugin to generate an Airflow DAG from Kedro pipeline. My Airflow instance is working in Kubernetes cluster with official apache/airflow image.

When I run the DAG's task, it fails with an error during KedroSession create method: PermissionError: [Errno 13] Permission denied: 'git'.

I know that this happens because Airflow image doesn't have an installed git. What really surprised me is the type of the error and that it is not covered. I searched for similar cases and found an issue for subprocess.check_output method here https://github.com/python/cpython/issues/69667. It says that the function raises PermissionError instead of FileNotFoundError (which is covered inside _describe_git function in Kedro session) when different users are used. This is the case of Airflow image, where commands are executed under airflow user.

Steps to Reproduce

Pre required: installed docker or another tool to create and run docker images.

  1. Create a generic Kedro project and create a simple Kedro pipeline. It should have at least one node. It doesn't need to do anything. Name the root package as my_kedro_dag and name the node as my_kedro_node.
  2. Generate Airflow DAG from the pipeline using Kedro-Airflow plugin.
  3. Create a new docker image, based on apache/airflow image. Copy there the Kedro project with the generated DAG into default DAG folder - /opt/airflow/dags. Also install the project's python package inside the image. Name the image as airflow-with-kedro-pipeline.
  4. Run the docker image under airflow user with the command docker run --user 50000 -it airflow-with-kedro-pipeline airflow tasks run my_kedro_dag my_kedro_node my_run_id123.

Expected Result

Pipeline finished without errors.

Actual Result

Traceeback of the error:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/ds_tools/airflow/operators/kedro.py", line 38, in execute
    with KedroSession.create(
  File "/home/airflow/.local/lib/python3.8/site-packages/kedro/framework/session/session.py", line 152, in create
    **_describe_git(session._project_path),
  File "/home/airflow/.local/lib/python3.8/site-packages/kedro/framework/session/session.py", line 34, in _describe_git
    res = subprocess.check_output(
  File "/usr/local/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/local/lib/python3.8/subprocess.py", line 493, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/local/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'git'

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.18.2
  • Python version used (python -V): 3.8
  • Operating system and version: Debian GNU/Linux 11 (bullseye)
  • Airflow image: apache/airflow:2.3.4-python3.8

vilozio avatar Sep 02 '22 18:09 vilozio