PythonDataScienceFullThrottle icon indicating copy to clipboard operation
PythonDataScienceFullThrottle copied to clipboard

Dockerfile change

Open wilberh opened this issue 2 years ago • 3 comments

Had to do 2 local changes listed below in the Dockerfile to make it work. Only the first time it took long to create the image because it was downloading the jupyter/pyspark-notebook base image(s) and all those spacy packages. I could be wrong on this but noticed it used at least 40GB of my local drive (that included me trying to find the correct tag for the base image), in order to produce a 12.9GB docker image.

Dockerfile changes:

  • had to set a specific Python3.8 version
  • added an ENTRYPOINT using "jupyter-lab"

Also, created a docker-compose file to simplify the cli-command [ docker compose up -d --build ] to build and (re)deploy/run the image.

# Based on the Dockerfiles from the Jupyter Development Team which 
# are Copyright (c) Jupyter Development Team and distributed under 
# the terms of the Modified BSD License.
ARG OWNER=jupyter
ARG BASE_CONTAINER=$OWNER/pyspark-notebook:python-3.8
FROM $BASE_CONTAINER

LABEL maintainer="Paul Deitel <[email protected]>"

# Fix: https://github.com/hadolint/hadolint/wiki/DL4006
# Fix: https://github.com/koalaman/shellcheck/wiki/SC3014
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

RUN mamba install --yes \
    'dnspython' \
    'folium' \
    'geopy' \
    'imageio' \
    'nltk'  \
    'pymongo' \
    'scikit-learn' \
    'spacy' \
    'tweepy' 
     
RUN pip install --upgrade \
    'tensorflow' \
    'openai' \
    'beautifulsoup4' \
    'deepl' \
    'mastodon.py' \
    'better_profanity'  \
    'tweet-preprocessor' \
    'ibm-watson' \
    'pubnub' \
    'textblob' \
    'wordcloud' \
    'dweepy' \
    'sounddevice'
    

# download data required by textblob and spacy
RUN python -m textblob.download_corpora && \
    python -m spacy download en_core_web_sm && \
    python -m spacy download en_core_web_md && \
    python -m spacy download en_core_web_lg 

# clean up
RUN mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

ENTRYPOINT ["start.sh", "jupyter-lab"]

Docker compose file:

version: "3"

services:
  deitelpydsft:
    container_name: deitelpydsft
    user: root
    volumes:
      - .:/home/jovyan/work
    build: .
    restart: always
    # env_file: .env
    ports:
      - "8888:8888"
      - "4040:4040"

wilberh avatar Sep 27 '23 16:09 wilberh

wilberh: the version tag in your docker-compose.yml file is no longer needed. It has been deprecated.

oppiet30 avatar May 25 '24 18:05 oppiet30

I get this error.

C:\Users\Administrator\Desktop\Python\PythonDataScienceFullThrottle>docker build -t deitelpydsft ERROR: "docker buildx build" requires exactly 1 argument. See 'docker buildx build --help'.

Usage: docker buildx build [OPTIONS] PATH | URL | -

Start a build

C:\Users\Administrator\Desktop\Python\PythonDataScienceFullThrottle>

oppiet30 avatar May 25 '24 19:05 oppiet30

oppiet30: add a period at the end ===>> docker build -t deitelpydsft .

wilberh avatar May 29 '24 15:05 wilberh