kedro
kedro copied to clipboard
Permission denied: 'git' is raised when running a node in apache/airflow Docker image
Description
An Airflow DAG created from Kedro pipeline using Kedro-Airflow plugin is running inside Kubernetes, using official apache/airflow image, fails with an error PermissionError: [Errno 13] Permission denied: 'git'
.
Context
I'd like to run a Kedro pipeline in Airflow. I've used Kedro-Airflow plugin to generate an Airflow DAG from Kedro pipeline. My Airflow instance is working in Kubernetes cluster with official apache/airflow image.
When I run the DAG's task, it fails with an error during KedroSession create
method: PermissionError: [Errno 13] Permission denied: 'git'
.
I know that this happens because Airflow image doesn't have an installed git
.
What really surprised me is the type of the error and that it is not covered.
I searched for similar cases and found an issue for subprocess.check_output
method here https://github.com/python/cpython/issues/69667. It says that the function raises PermissionError
instead of FileNotFoundError
(which is covered inside _describe_git
function in Kedro session) when different users are used.
This is the case of Airflow image, where commands are executed under airflow user.
Steps to Reproduce
Pre required: installed docker or another tool to create and run docker images.
- Create a generic Kedro project and create a simple Kedro pipeline. It should have at least one node. It doesn't need to do anything.
Name the root package as
my_kedro_dag
and name the node asmy_kedro_node
. - Generate Airflow DAG from the pipeline using Kedro-Airflow plugin.
- Create a new docker image, based on apache/airflow image. Copy there the Kedro project with the generated DAG into default DAG folder -
/opt/airflow/dags
. Also install the project's python package inside the image. Name the image asairflow-with-kedro-pipeline
. - Run the docker image under airflow user with the command
docker run --user 50000 -it airflow-with-kedro-pipeline airflow tasks run my_kedro_dag my_kedro_node my_run_id123
.
Expected Result
Pipeline finished without errors.
Actual Result
Traceeback of the error:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/ds_tools/airflow/operators/kedro.py", line 38, in execute
with KedroSession.create(
File "/home/airflow/.local/lib/python3.8/site-packages/kedro/framework/session/session.py", line 152, in create
**_describe_git(session._project_path),
File "/home/airflow/.local/lib/python3.8/site-packages/kedro/framework/session/session.py", line 34, in _describe_git
res = subprocess.check_output(
File "/usr/local/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/local/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.8/subprocess.py", line 858, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'git'
Your Environment
- Kedro version used (
pip show kedro
orkedro -V
): 0.18.2 - Python version used (
python -V
): 3.8 - Operating system and version: Debian GNU/Linux 11 (bullseye)
- Airflow image: apache/airflow:2.3.4-python3.8