docker-airflow
docker-airflow copied to clipboard
Unable to import papermill module
I wasn't able to import the papermill module. Any idea how to fix this?
Error log: webserver_1 | from airflow.operators.papermill_operator import PapermillOperator webserver_1 | ModuleNotFoundError: No module named 'airflow.operators.papermill_operator'
same issue
Same issue
same here, please help! pip install 'apache-airflow[papermill]' Requirement already satisfied (use --upgrade to upgrade): apache-airflow[papermill] in /usr/lib/python2.7/site-packages apache-airflow 1.10.0 does not provide the extra 'papermill'
same here :(
ModuleNotFoundError: No module named 'airflow.operators.papermill_operator'
after: pip install 'apache-airflow[papermill]' and create a dag using PapermillOperator
Here's what I did to solve my problem: Goto the source code repository and download the papermill operator. https://github.com/apache/airflow/tree/master/airflow/operators Depending on your vesion of Python, I'm on 2.7 so I had to change the init like so:
def __init__(self, input_nb=None, output_nb=None, parameters=None, *args, **kwargs): super(PapermillOperator, self).__init__(*args, **kwargs)
Now move this file to your airflow operators folder.
Oh, and don't forget to pip install papermill
the latest version of airflow (1.10.5) does not have papermill as an extra yet.
Assuming the next version will be 1.10.6 you will be able to locally build a docker image with papermill using (will require additional changes to entrypoint script as well):
docker build --rm --build-arg AIRFLOW_DEPS="papermill" \
--build-arg PYTHON_DEPS="papermill==1.1.0" \
--build-arg AIRFLOW_VERSION=1.10.6 -t puckel/docker-airflow:some_tag .
Temporary solution is replacing (in the Dockerfile)
&& pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
with
&& git config --global user.name "John Doe" \
&& git config --global user.email [email protected] \
&& git clone --branch 1.10.4 https://github.com/apache/airflow.git \
&& cd airflow; git cherry-pick 0e2b02c; pip install .[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}] \
EDIT: Updated cherry-pick solution.
@SeanBE> Can you be more specific with respect to above fix , I am not able to make it run by just tweaking Dockerfile
@balijepalli my bad! I rushed with the cherry-pick solution. Apply the following diff patch to the Dockerfile
diff --git Dockerfile Dockerfile
index f1bf033..97dcd23 100644
--- Dockerfile
+++ Dockerfile
@@ -56,7 +56,10 @@ RUN set -ex \
&& pip install pyOpenSSL \
&& pip install ndg-httpsclient \
&& pip install pyasn1 \
- && pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
+ && git config --global user.name "John Doe" \
+ && git config --global user.email [email protected] \
+ && git clone --branch 1.10.4 https://github.com/apache/airflow.git \
+ && cd airflow; git cherry-pick 0e2b02c; pip install .[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}] \
&& pip install 'redis==3.2' \
&& if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
&& apt-get purge --auto-remove -yqq $buildDeps \
and then run
docker build --rm --build-arg AIRFLOW_DEPS="papermill" --build-arg PYTHON_DEPS="papermill==1.1.0" -t puckel/docker-airflow:papermill .
Voila!
I'm getting the same issue with version 1.10.7. Adding a notebook to the required folder I get Broken DAG: ....No module named 'papermill'
I tested with localExecutor and Celeryexecutor.
My configuration is:
Dockerfile - Add papermill package to airflow install line
....
&& pip install pyasn1 \
&& pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,papermill,aws,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
&& pip install 'redis==3.2' \
&& if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
COPY script/entrypoint.sh /entrypoint.sh
COPY config/airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg
RUN chown -R airflow: ${AIRFLOW_USER_HOME}
EXPOSE 8080 5555 8793
USER airflow
WORKDIR ${AIRFLOW_USER_HOME}
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"]
docker-compose-LocalExecutor.yml (only add a volume files where the required notebook files are)
volumes:
- ./dags:/usr/local/airflow/dags:cached
- ./plugins:/usr/local/airflow/plugins:cached
- ./files:/usr/local/airflow/files:cached
Script to build everything:
#!/bin/bash
echo "Pulling puckel/docker-airflow image"
docker pull puckel/docker-airflow
echo "==================================="
echo "Building docker"
echo "==================================="
docker build --rm --file Dockerfile --build-arg AIRFLOW_DEPS="datadog,dask,papermill" --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .
echo "==================================="
echo "Docker compose"
echo "==================================="
docker-compose -f docker-compose-LocalExecutor.yml up -d --remove-orphans
I have checked the docker build and installs papermill package
+ pip install pyasn1
Requirement already satisfied: pyasn1 in /usr/local/lib/python3.7/site-packages (0.4.8)
+ pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,papermill,aws,ssh,datadog,dask,papermill]==1.10.7
Collecting apache-airflow[aws,celery,crypto,dask,datadog,hive,jdbc,mysql,papermill,postgres,ssh]==1.10.7
......
Collecting papermill[all]>=1.0.0; extra == "papermill"
Downloading papermill-1.2.1-py2.py3-none-any.whl (31 kB)
Tested the notebook locally and works fine
DAG code
from datetime import datetime,timedelta
from airflow.models import DAG
from airflow.operators.papermill_operator import PapermillOperator
default_args ={
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2015, 6, 1),
"email": ["[email protected]"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
}
with DAG (
dag_id = 'run_example_notebook_papermill_operator',
default_args=default_args,
schedule_interval=timedelta(1)
) as dag:
run_this = PapermillOperator(
task_id="run_example_notebook",
input_nb="/files/code.ipynb",
output_nb="/files/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"},
dag=dag,
)
Refreshing Airflow page http://localhost:8080/admin/ I get - Broken DAG: [/usr/local/airflow/dags/test_notebook.py] No module named 'papermill'
Looks like I'm probably missing or missunderstanding something. If someone has find any solution will help me a lot.
I have found the solution, I needed to add ":latest" in "puckel/docker-airflow:latest" at the docker compose.
@bllamasy How did you solve "No module named 'airflow provider'" with this ":latest" in "puckel/docker-airflow:latest" at the docker compose . Can you help me how to use this ":latest" in "puckel/docker-airflow:latest"