spark-operator
spark-operator copied to clipboard
Add support to Spark 3.3.0
Spark 3.3.0 was released, I have a custom image of 3.2.1 running with the operator without any issues, and will experiment with 3.3.0 this week, will post the results here.
@josecsotomorales Awesome. Looking forward to the this
Hi, It seems that on spark 3.3.0, a validation was added to check that the executor pod name prefix is not more than 47 chars.
We've seen that on scheduled applications, the operator adds a long timestamp + some id before the "exec-id" and then the validation fails the pod creation. For example: "some-application-1657634168668185553-300e8181f2b26a1a-exec-1" results in exception:
java.lang.IllegalArgumentException: 'some-application-1657634168668185553-300e8181f2b26a1a' in spark.kubernetes.executor.podNamePrefix is invalid. must conform https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names and the value length <= 47 at org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) ~[spark-core_2.12-3.3.0.jar:3.3.0] at org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) ~[spark-core_2.12-3.3.0.jar:3.3.0] at scala.Option.map(Option.scala:230) ~[scala-library-2.12.12.jar:?]
Hi @josecsotomorales! Are you able to use spark 3.3.0 without any issues?
@devender-yadav worked all good except for the issue pointed above with scheduled spark apps and pod creation with long names
@josecsotomorales how did you create the spark 3.3.0 image compatibility with gcr.io/spark-operator/spark:v3.1.1?
something like this would work, you just need to adjust this dockerfile to add your spark jar app if applicable and required libraries:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=11-jre-slim
FROM openjdk:${java_image_tag} as base
# ARG spark_uid=185
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN set -ex && \
sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
apt-get update && \
ln -s /lib /lib64 && \
apt install -y bash tini libc6 libpam-modules krb5-user libnss3 wget && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
rm -rf /var/cache/apt/*
FROM base as spark
### Download Spark Distribution ###
WORKDIR /opt
RUN wget https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.2.tgz
RUN tar xvf spark-3.3.0-bin-hadoop3.2.tgz
FROM spark as build
### Create target directories ###
RUN mkdir -p /opt/spark/jars
### Set Spark dir ARG for use Docker build context on root project dir ###
FROM base as final
ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3.2
WORKDIR /opt/spark/work-dir
ENV SPARK_HOME /opt/spark
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
# USER ${spark_uid}
hi,
anybody knows if this pod issue can be managed by any spark-operator parameter or name template is hardcoded? https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/api-docs.md
Seems there is a podName property within the DriverSpec but seems that it is not a template. Jus the whole name.
@josecsotomorales I build the image according to this Dockerfile, and fail in this step. What script is this
chmod: cannot access '/opt/decom.sh': No such file or directory
@jiangjian0920 try this one:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=11-jre-slim
FROM openjdk:${java_image_tag} as base
# ARG spark_uid=185
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN set -ex && \
sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
apt-get update && \
ln -s /lib /lib64 && \
apt install -y bash tini libc6 libpam-modules krb5-user libnss3 wget && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
rm -rf /var/cache/apt/*
FROM base as spark
### Download Spark Distribution ###
WORKDIR /opt
RUN wget https://www.apache.org/dyn/closer.lua/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
RUN tar xvf spark-3.3.0-bin-hadoop3.tgz
FROM spark as build
### Create target directories ###
RUN mkdir -p /opt/spark/jars
### Set Spark dir ARG for use Docker build context on root project dir ###
FROM base as final
ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3
### Copy files from the build image ###
COPY --from=build ${spark_dir}/jars /opt/spark/jars
COPY --from=build ${spark_dir}/bin /opt/spark/bin
COPY --from=build ${spark_dir}/sbin /opt/spark/sbin
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/decom.sh /opt/
COPY --from=build ${spark_dir}/examples /opt/spark/examples
COPY --from=build ${spark_dir}/kubernetes/tests /opt/spark/tests
COPY --from=build ${spark_dir}/data /opt/spark/data
WORKDIR /opt/spark/work-dir
ENV SPARK_HOME /opt/spark
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
# USER ${spark_uid}
Note: You can also use the original one as a base to build your custom image: https://github.com/apache/spark/blob/branch-3.3/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
@josecsotomorales thanks for that,I want to build an image that supports version 3.1.2,I'll try again
@jiangjian0920 just use that one I provided and change the spark version, you should be good to go 👍🏻
@josecsotomorales Have you ever tried to change the user of the Spark image? I changed the user of the 2.4.0 version image, and then the following error is reported. Have you ever encountered it?
spark-error.log Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
@shlomi-beinhoren Hi,
I wanna know how to add executor pod name prefix by Spark Operator? https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/api-docs.md#executorspec I didn't find any specification in ExecutorSpec page is related to executor pod name prefix. or did you use saprk native properties? I want to match my driver and executors but get trouble. Thanks!
@shlomi-beinhoren Hi,
I wanna know how to add executor pod name prefix by Spark Operator? https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/api-docs.md#executorspec I didn't find any specification in ExecutorSpec page is related to executor pod name prefix. or did you use saprk native properties? I want to match my driver and executors but get trouble. Thanks!
Hi @harryzhang2016 I did it with spark native properties under the sparkConf section described here - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#specifying-spark-configuration Adding the property "spark.kubernetes.executor.podNamePrefix"
Spark 3.3.0 testing updates:
I built a custom spark 3.3 image and I'm using it with the latest operator image, no issues so far except for the pod name prefix with no more than 47 chars limitation.
This is the Dockerfile with the custom spark 3.3 image if you folks want to use it:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=17-jre
FROM eclipse-temurin:${java_image_tag} as base
# ARG spark_uid=185
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN set -ex && \
apt-get update && \
ln -s /lib /lib64 && \
apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*
FROM base as spark
### Download Spark Distribution ###
WORKDIR /opt
RUN wget https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
RUN tar xvf spark-3.3.0-bin-hadoop3.tgz
FROM spark as build
### Create target directories ###
RUN mkdir -p /opt/spark/jars
### Set Spark dir ARG for use Docker build context on root project dir ###
FROM base as final
ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3
### Copy files from the build image ###
COPY --from=build ${spark_dir}/jars /opt/spark/jars
COPY --from=build /opt/spark/jars /opt/spark/jars
COPY --from=build ${spark_dir}/bin /opt/spark/bin
COPY --from=build ${spark_dir}/sbin /opt/spark/sbin
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/decom.sh /opt/
COPY --from=build ${spark_dir}/examples /opt/spark/examples
COPY --from=build ${spark_dir}/kubernetes/tests /opt/spark/tests
COPY --from=build ${spark_dir}/data /opt/spark/data
WORKDIR /opt/spark/work-dir
ENV SPARK_HOME /opt/spark
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
# USER ${spark_uid}
@josecsotomorales What would be the steps to make it work? Just create this image and set this image in the spark application yaml ?
@renanxx1 that's it :)
@josecsotomorales I used as base the image gcr.io/spark-operator/spark-py:v3.1.1-hadoop3
and I made the image build and used it in the spark yaml. But when I try to run the spark job I'm getting the error :
from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark'
What could I be missing?
Any reason that there isn't an official image with updated spark version for over a year now? Is the spark operator properly maintained?
Any reason that there isn't an official image with updated spark version for over a year now? Is the spark operator properly maintained?
As far as I perceived it this repo is not actively maintained by the GoogleCloudPlatform team. I pinged them on LinkedIn once to suggest that they might collaborate with or gift it to the Apache SF because the source code also started to diverge from the upstream apache/spark k8s resource manager from what I've seen in the PR I did half a year ago (still no review on this PR).
I'm considering forking this repo and maintaining it, already asked on the spark operator slack channel what are next steps.
This is the Dockerfile with the custom spark 3.3 image if you folks want to use it:
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ARG java_image_tag=17-jre FROM eclipse-temurin:${java_image_tag} as base # ARG spark_uid=185 # Before building the docker image, first build and make a Spark distribution following # the instructions in http://spark.apache.org/docs/latest/building-spark.html. # If this docker file is being used in the context of building your images from a Spark # distribution, the docker build command should be invoked from the top level directory # of the Spark distribution. E.g.: # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . RUN set -ex && \ apt-get update && \ ln -s /lib /lib64 && \ apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools && \ rm /bin/sh && \ ln -sv /bin/bash /bin/sh && \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/* FROM base as spark ### Download Spark Distribution ### WORKDIR /opt RUN wget https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz RUN tar xvf spark-3.3.0-bin-hadoop3.tgz FROM spark as build ### Create target directories ### RUN mkdir -p /opt/spark/jars ### Set Spark dir ARG for use Docker build context on root project dir ### FROM base as final ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3 ### Copy files from the build image ### COPY --from=build ${spark_dir}/jars /opt/spark/jars COPY --from=build /opt/spark/jars /opt/spark/jars COPY --from=build ${spark_dir}/bin /opt/spark/bin COPY --from=build ${spark_dir}/sbin /opt/spark/sbin COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/entrypoint.sh /opt/ COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/decom.sh /opt/ COPY --from=build ${spark_dir}/examples /opt/spark/examples COPY --from=build ${spark_dir}/kubernetes/tests /opt/spark/tests COPY --from=build ${spark_dir}/data /opt/spark/data WORKDIR /opt/spark/work-dir ENV SPARK_HOME /opt/spark WORKDIR /opt/spark/work-dir RUN chmod g+w /opt/spark/work-dir RUN chmod a+x /opt/decom.sh ENTRYPOINT [ "/opt/entrypoint.sh" ] # Specify the User that the actual main process will run as # USER ${spark_uid}
sorry to bother, but after building this image and using in spark operator, the pod keeps restaring with the following message:
how could i solve this ? changed spark to 3.3.1
Hey @matheus-rossi , how's it going ? I'm using this one, you can try it.
FROM apache/spark:v3.3.1
USER 185
WORKDIR /
USER 0
RUN apt-get update && apt install -y python3 python3-pip && pip3 install --upgrade pip setuptools && rm -r /root/.cache && rm -rf /var/cache/apt/*
RUN pip3 install pyspark==3.3.1
WORKDIR /opt/spark/work-dir
ENTRYPOINT ["/opt/entrypoint.sh"]
ARG spark_uid=185
USER 185
Hey @matheus-rossi , how's it going ? I'm using this one, you can try it.
FROM apache/spark:v3.3.1 USER 185 WORKDIR / USER 0 RUN apt-get update && apt install -y python3 python3-pip && pip3 install --upgrade pip setuptools && rm -r /root/.cache && rm -rf /var/cache/apt/* RUN pip3 install pyspark==3.3.1 WORKDIR /opt/spark/work-dir ENTRYPOINT ["/opt/entrypoint.sh"] ARG spark_uid=185 USER 185
Thanks Renan, that works for me
@josecsotomorales any news on the fork? It is hilarious that Google can't update Spark to even 3.2 after having working code from contributors and a year after 3.2 release...
Also looking for an update on this. @josecsotomorales did you decide to fork?
+1 on this issue
+1