zeppelin
zeppelin copied to clipboard
[ZEPPELIN-5461] Shell script to generate the Dockerfile of zeppelin interpreter docker image
What is this PR for?
Currently, users can build a smaller docker image of zeppelin interpreter by modifying the Dockerfile in scripts/docker/zeppelin-interpreter
. But I think a more ideal way is that user can use a shell script to build the docker image (something like docker-image-tool.sh
in Apache Spark) and specify the interpreters they want.
Example usage:
Usage: ./Dockerfile_gen.sh [options] [command]
This script outputs a Dockerfile for building the zeppelin image that contains some specific interpreters.
Options:
-i interpreter (Optional) A comma-separated list of interpreter directory names (under /path/to/zeppelin/interpreter/)
that need to be add into the docker image.
By default, it will add the spark interpreter.
-c conda yaml file (Optional) Specify the conda yaml file that manages python and R packages. By default, it will not install
python and R packages through conda.
-v zeppelin version (Optional) Specify the version of zeppelin. By default, the version is "0.9.0".
Examples:
- Output the Dockerfile for building the zeppelin image that contains spark and python interpreter.
./Dockerfile_gen.sh -i spark,python
- Output the Dockerfile for building the zeppelin image that contains spark and python interpreter and specify the
conda yaml file "python3.yaml"
./Dockerfile_gen.sh -i spark,python -c python3.yaml
Example generated Dockerfile:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License."
FROM apache/zeppelin:0.9.0 AS zeppelin-distribution
FROM ubuntu:20.04
LABEL maintainer="Apache Software Foundation <[email protected]>"
ARG version="0.9.0"
ENV VERSION="${version}" \
ZEPPELIN_HOME="/opt/zeppelin"
RUN set -ex && \
apt-get -y update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y openjdk-8-jre-headless wget tini && \
# Cleanup
rm -rf /var/lib/apt/lists/* && \
apt-get autoclean && \
apt-get clean
COPY --from=zeppelin-distribution /opt/zeppelin/bin ${ZEPPELIN_HOME}/bin
COPY log4j.properties ${ZEPPELIN_HOME}/conf/
COPY log4j_yarn_cluster.properties ${ZEPPELIN_HOME}/conf/
# Copy interpreter-shaded JAR, needed for all interpreters
COPY --from=zeppelin-distribution /opt/zeppelin/interpreter/zeppelin-interpreter-shaded-${VERSION}.jar ${ZEPPELIN_HOME}/interpreter/zeppelin-interpreter-shaded-${VERSION}.jar
# copy interpreter spark
COPY --from=zeppelin-distribution /opt/zeppelin/interpreter/spark ${ZEPPELIN_HOME}/interpreter/spark
RUN mkdir -p "${ZEPPELIN_HOME}/logs" "${ZEPPELIN_HOME}/run" "${ZEPPELIN_HOME}/local-repo" && \
# Allow process to edit /etc/passwd, to create a user entry for zeppelin
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
# Give access to some specific folders
chmod -R 775 "${ZEPPELIN_HOME}/logs" "${ZEPPELIN_HOME}/run" "${ZEPPELIN_HOME}/local-repo"
USER 1000
ENTRYPOINT [ "/usr/bin/tini", "--" ]
WORKDIR ${ZEPPELIN_HOME}
What type of PR is it?
[Improvement]
Todos
- [ ] - Task
What is the Jira issue?
How should this be tested?
- CI pass and manually tested
Screenshots (if appropriate)
Questions:
- Does the licenses files need update? No
- Is there breaking changes for older versions? No
- Does this needs documentation? No