docker-airflow
docker-airflow copied to clipboard
Unable to import papermill module
I wasn't able to import the papermill module. Any idea how to fix this?
Error log: webserver_1 | from airflow.operators.papermill_operator import PapermillOperator webserver_1 | ModuleNotFoundError: No module named 'airflow.operators.papermill_operator'
same issue
Same issue
same here, please help! pip install 'apache-airflow[papermill]' Requirement already satisfied (use --upgrade to upgrade): apache-airflow[papermill] in /usr/lib/python2.7/site-packages apache-airflow 1.10.0 does not provide the extra 'papermill'
same here :(
ModuleNotFoundError: No module named 'airflow.operators.papermill_operator'
after: pip install 'apache-airflow[papermill]'
and create a dag using PapermillOperator
Here's what I did to solve my problem: Goto the source code repository and download the papermill operator. https://github.com/apache/airflow/tree/master/airflow/operators Depending on your vesion of Python, I'm on 2.7 so I had to change the init like so:
def __init__(self, input_nb=None, output_nb=None, parameters=None, *args, **kwargs): super(PapermillOperator, self).__init__(*args, **kwargs)
Now move this file to your airflow operators folder.
Oh, and don't forget to pip install papermill
the latest version of airflow (1.10.5
) does not have papermill as an extra yet.
Assuming the next version will be 1.10.6
you will be able to locally build a docker image with papermill using (will require additional changes to entrypoint script as well):
docker build --rm --build-arg AIRFLOW_DEPS="papermill" \
--build-arg PYTHON_DEPS="papermill==1.1.0" \
--build-arg AIRFLOW_VERSION=1.10.6 -t puckel/docker-airflow:some_tag .
Temporary solution is replacing (in the Dockerfile)
&& pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
with
&& git config --global user.name "John Doe" \
&& git config --global user.email [email protected] \
&& git clone --branch 1.10.4 https://github.com/apache/airflow.git \
&& cd airflow; git cherry-pick 0e2b02c; pip install .[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}] \
EDIT: Updated cherry-pick solution.
@SeanBE> Can you be more specific with respect to above fix , I am not able to make it run by just tweaking Dockerfile
@balijepalli my bad! I rushed with the cherry-pick solution. Apply the following diff patch to the Dockerfile
diff --git Dockerfile Dockerfile
index f1bf033..97dcd23 100644
--- Dockerfile
+++ Dockerfile
@@ -56,7 +56,10 @@ RUN set -ex \
&& pip install pyOpenSSL \
&& pip install ndg-httpsclient \
&& pip install pyasn1 \
- && pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
+ && git config --global user.name "John Doe" \
+ && git config --global user.email [email protected] \
+ && git clone --branch 1.10.4 https://github.com/apache/airflow.git \
+ && cd airflow; git cherry-pick 0e2b02c; pip install .[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}] \
&& pip install 'redis==3.2' \
&& if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
&& apt-get purge --auto-remove -yqq $buildDeps \
and then run
docker build --rm --build-arg AIRFLOW_DEPS="papermill" --build-arg PYTHON_DEPS="papermill==1.1.0" -t puckel/docker-airflow:papermill .
Voila!
I'm getting the same issue with version 1.10.7. Adding a notebook to the required folder I get Broken DAG: ....No module named 'papermill'
I tested with localExecutor and Celeryexecutor.
My configuration is:
Dockerfile - Add papermill package to airflow install line
....
&& pip install pyasn1 \
&& pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,papermill,aws,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
&& pip install 'redis==3.2' \
&& if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
COPY script/entrypoint.sh /entrypoint.sh
COPY config/airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg
RUN chown -R airflow: ${AIRFLOW_USER_HOME}
EXPOSE 8080 5555 8793
USER airflow
WORKDIR ${AIRFLOW_USER_HOME}
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"]
docker-compose-LocalExecutor.yml (only add a volume files where the required notebook files are)
volumes:
- ./dags:/usr/local/airflow/dags:cached
- ./plugins:/usr/local/airflow/plugins:cached
- ./files:/usr/local/airflow/files:cached
Script to build everything:
#!/bin/bash
echo "Pulling puckel/docker-airflow image"
docker pull puckel/docker-airflow
echo "==================================="
echo "Building docker"
echo "==================================="
docker build --rm --file Dockerfile --build-arg AIRFLOW_DEPS="datadog,dask,papermill" --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .
echo "==================================="
echo "Docker compose"
echo "==================================="
docker-compose -f docker-compose-LocalExecutor.yml up -d --remove-orphans
I have checked the docker build and installs papermill package
+ pip install pyasn1
Requirement already satisfied: pyasn1 in /usr/local/lib/python3.7/site-packages (0.4.8)
+ pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,papermill,aws,ssh,datadog,dask,papermill]==1.10.7
Collecting apache-airflow[aws,celery,crypto,dask,datadog,hive,jdbc,mysql,papermill,postgres,ssh]==1.10.7
......
Collecting papermill[all]>=1.0.0; extra == "papermill"
Downloading papermill-1.2.1-py2.py3-none-any.whl (31 kB)
Tested the notebook locally and works fine
DAG code
from datetime import datetime,timedelta
from airflow.models import DAG
from airflow.operators.papermill_operator import PapermillOperator
default_args ={
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2015, 6, 1),
"email": ["[email protected]"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
}
with DAG (
dag_id = 'run_example_notebook_papermill_operator',
default_args=default_args,
schedule_interval=timedelta(1)
) as dag:
run_this = PapermillOperator(
task_id="run_example_notebook",
input_nb="/files/code.ipynb",
output_nb="/files/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"},
dag=dag,
)
Refreshing Airflow page http://localhost:8080/admin/ I get - Broken DAG: [/usr/local/airflow/dags/test_notebook.py] No module named 'papermill'
Looks like I'm probably missing or missunderstanding something. If someone has find any solution will help me a lot.
I have found the solution, I needed to add ":latest" in "puckel/docker-airflow:latest" at the docker compose.
@bllamasy How did you solve "No module named 'airflow provider'" with this ":latest" in "puckel/docker-airflow:latest" at the docker compose . Can you help me how to use this ":latest" in "puckel/docker-airflow:latest"