docker-airflow icon indicating copy to clipboard operation
docker-airflow copied to clipboard

Unable to import papermill module

Open weichea opened this issue 4 years ago • 11 comments

I wasn't able to import the papermill module. Any idea how to fix this?

Error log: webserver_1 | from airflow.operators.papermill_operator import PapermillOperator webserver_1 | ModuleNotFoundError: No module named 'airflow.operators.papermill_operator'

weichea avatar Aug 10 '19 09:08 weichea

same issue

sudeshgit avatar Aug 29 '19 05:08 sudeshgit

Same issue

bwhitby avatar Sep 03 '19 23:09 bwhitby

same here, please help! pip install 'apache-airflow[papermill]' Requirement already satisfied (use --upgrade to upgrade): apache-airflow[papermill] in /usr/lib/python2.7/site-packages apache-airflow 1.10.0 does not provide the extra 'papermill'

johanwasserman avatar Sep 04 '19 13:09 johanwasserman

same here :(

ModuleNotFoundError: No module named 'airflow.operators.papermill_operator' after: pip install 'apache-airflow[papermill]' and create a dag using PapermillOperator

lfpelison avatar Sep 05 '19 16:09 lfpelison

Here's what I did to solve my problem: Goto the source code repository and download the papermill operator. https://github.com/apache/airflow/tree/master/airflow/operators Depending on your vesion of Python, I'm on 2.7 so I had to change the init like so:

def __init__(self, input_nb=None, output_nb=None, parameters=None, *args, **kwargs): super(PapermillOperator, self).__init__(*args, **kwargs)

Now move this file to your airflow operators folder.

Oh, and don't forget to pip install papermill

johanwasserman avatar Sep 05 '19 19:09 johanwasserman

the latest version of airflow (1.10.5) does not have papermill as an extra yet.

Assuming the next version will be 1.10.6 you will be able to locally build a docker image with papermill using (will require additional changes to entrypoint script as well):

docker build --rm --build-arg AIRFLOW_DEPS="papermill" \
     --build-arg PYTHON_DEPS="papermill==1.1.0" \
     --build-arg AIRFLOW_VERSION=1.10.6 -t puckel/docker-airflow:some_tag .

Temporary solution is replacing (in the Dockerfile)

&& pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \

with

&& git config --global user.name "John Doe" \
&& git config --global user.email [email protected] \
&& git clone --branch 1.10.4 https://github.com/apache/airflow.git \
&& cd airflow; git cherry-pick 0e2b02c; pip install .[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}] \

EDIT: Updated cherry-pick solution.

SeanBE avatar Sep 18 '19 15:09 SeanBE

@SeanBE> Can you be more specific with respect to above fix , I am not able to make it run by just tweaking Dockerfile

balijepalli avatar Sep 25 '19 10:09 balijepalli

@balijepalli my bad! I rushed with the cherry-pick solution. Apply the following diff patch to the Dockerfile

diff --git Dockerfile Dockerfile
index f1bf033..97dcd23 100644
--- Dockerfile
+++ Dockerfile
@@ -56,7 +56,10 @@ RUN set -ex \
     && pip install pyOpenSSL \
     && pip install ndg-httpsclient \
     && pip install pyasn1 \
-    && pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
+    && git config --global user.name "John Doe" \
+    && git config --global user.email [email protected] \
+    && git clone --branch 1.10.4 https://github.com/apache/airflow.git \
+    && cd airflow; git cherry-pick 0e2b02c; pip install .[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}] \
     && pip install 'redis==3.2' \
     && if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \
     && apt-get purge --auto-remove -yqq $buildDeps \

and then run

docker build --rm --build-arg AIRFLOW_DEPS="papermill" --build-arg PYTHON_DEPS="papermill==1.1.0" -t puckel/docker-airflow:papermill .

Voila!

SeanBE avatar Sep 26 '19 10:09 SeanBE

I'm getting the same issue with version 1.10.7. Adding a notebook to the required folder I get Broken DAG: ....No module named 'papermill'

I tested with localExecutor and Celeryexecutor.

My configuration is:

Dockerfile - Add papermill package to airflow install line

....
    && pip install pyasn1 \
    && pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,papermill,aws,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
    && pip install 'redis==3.2' \
    && if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \


COPY script/entrypoint.sh /entrypoint.sh
COPY config/airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg

RUN chown -R airflow: ${AIRFLOW_USER_HOME}

EXPOSE 8080 5555 8793

USER airflow
WORKDIR ${AIRFLOW_USER_HOME}
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"]

docker-compose-LocalExecutor.yml (only add a volume files where the required notebook files are)


        volumes:
            - ./dags:/usr/local/airflow/dags:cached
            - ./plugins:/usr/local/airflow/plugins:cached
            - ./files:/usr/local/airflow/files:cached

Script to build everything:

#!/bin/bash
echo "Pulling puckel/docker-airflow image"
docker pull puckel/docker-airflow
echo "==================================="
echo "Building docker"
echo "==================================="
docker build --rm --file Dockerfile --build-arg AIRFLOW_DEPS="datadog,dask,papermill" --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .
echo "==================================="
echo "Docker compose"
echo "==================================="
docker-compose -f docker-compose-LocalExecutor.yml up -d --remove-orphans

I have checked the docker build and installs papermill package

+ pip install pyasn1
Requirement already satisfied: pyasn1 in /usr/local/lib/python3.7/site-packages (0.4.8)
+ pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,papermill,aws,ssh,datadog,dask,papermill]==1.10.7
Collecting apache-airflow[aws,celery,crypto,dask,datadog,hive,jdbc,mysql,papermill,postgres,ssh]==1.10.7
......
Collecting papermill[all]>=1.0.0; extra == "papermill"
  Downloading papermill-1.2.1-py2.py3-none-any.whl (31 kB)

Tested the notebook locally and works fine

DAG code

from datetime import datetime,timedelta
from airflow.models import DAG
from airflow.operators.papermill_operator import PapermillOperator


default_args ={
   "owner": "airflow",
    "depends_on_past": False,
    "start_date": datetime(2015, 6, 1),
    "email": ["[email protected]"],
    "email_on_failure": False,
    "email_on_retry": False,
    "retries": 1,
    "retry_delay": timedelta(minutes=5),
}

with DAG (
    dag_id = 'run_example_notebook_papermill_operator',
    default_args=default_args,
    schedule_interval=timedelta(1)
) as dag:
   run_this = PapermillOperator(
       task_id="run_example_notebook",
       input_nb="/files/code.ipynb",
       output_nb="/files/out-{{ execution_date }}.ipynb",
       parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"},
       dag=dag,
    )

Refreshing Airflow page http://localhost:8080/admin/ I get - Broken DAG: [/usr/local/airflow/dags/test_notebook.py] No module named 'papermill'

Looks like I'm probably missing or missunderstanding something. If someone has find any solution will help me a lot.

bllamasy avatar Jan 31 '20 13:01 bllamasy

I have found the solution, I needed to add ":latest" in "puckel/docker-airflow:latest" at the docker compose.

bllamasy avatar Feb 03 '20 09:02 bllamasy

@bllamasy How did you solve "No module named 'airflow provider'" with this ":latest" in "puckel/docker-airflow:latest" at the docker compose . Can you help me how to use this ":latest" in "puckel/docker-airflow:latest"

vandanafs avatar Apr 11 '22 00:04 vandanafs