spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Add support to Spark 3.3.0

Open josecsotomorales opened this issue 2 years ago • 11 comments

Spark 3.3.0 was released, I have a custom image of 3.2.1 running with the operator without any issues, and will experiment with 3.3.0 this week, will post the results here.

josecsotomorales avatar Jun 28 '22 01:06 josecsotomorales

@josecsotomorales Awesome. Looking forward to the this

dcoliversun avatar Jun 29 '22 09:06 dcoliversun

Hi, It seems that on spark 3.3.0, a validation was added to check that the executor pod name prefix is not more than 47 chars.

We've seen that on scheduled applications, the operator adds a long timestamp + some id before the "exec-id" and then the validation fails the pod creation. For example: "some-application-1657634168668185553-300e8181f2b26a1a-exec-1" results in exception:

java.lang.IllegalArgumentException: 'some-application-1657634168668185553-300e8181f2b26a1a' in spark.kubernetes.executor.podNamePrefix is invalid. must conform https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names and the value length <= 47 at org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) ~[spark-core_2.12-3.3.0.jar:3.3.0] at org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) ~[spark-core_2.12-3.3.0.jar:3.3.0] at scala.Option.map(Option.scala:230) ~[scala-library-2.12.12.jar:?]

shlomi-beinhoren avatar Jul 12 '22 17:07 shlomi-beinhoren

Hi @josecsotomorales! Are you able to use spark 3.3.0 without any issues?

devender-yadav avatar Aug 30 '22 12:08 devender-yadav

@devender-yadav worked all good except for the issue pointed above with scheduled spark apps and pod creation with long names

josecsotomorales avatar Aug 30 '22 13:08 josecsotomorales

@josecsotomorales how did you create the spark 3.3.0 image compatibility with gcr.io/spark-operator/spark:v3.1.1?

devender-yadav avatar Aug 30 '22 14:08 devender-yadav

something like this would work, you just need to adjust this dockerfile to add your spark jar app if applicable and required libraries:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=11-jre-slim

FROM openjdk:${java_image_tag} as base

# ARG spark_uid=185

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
    apt-get update && \
    ln -s /lib /lib64 && \
    apt install -y bash tini libc6 libpam-modules krb5-user libnss3 wget && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
    rm -rf /var/cache/apt/*

FROM base as spark
### Download Spark Distribution ###
WORKDIR /opt
RUN wget https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.2.tgz
RUN tar xvf spark-3.3.0-bin-hadoop3.2.tgz

FROM spark as build
### Create target directories ###
RUN mkdir -p /opt/spark/jars

### Set Spark dir ARG for use Docker build context on root project dir ###
FROM base as final
ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3.2

WORKDIR /opt/spark/work-dir
ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
# USER ${spark_uid}

josecsotomorales avatar Sep 01 '22 16:09 josecsotomorales

hi,

anybody knows if this pod issue can be managed by any spark-operator parameter or name template is hardcoded? https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/api-docs.md

Seems there is a podName property within the DriverSpec but seems that it is not a template. Jus the whole name.

jmoyano-koa avatar Sep 05 '22 15:09 jmoyano-koa

@josecsotomorales I build the image according to this Dockerfile, and fail in this step. What script is this

chmod: cannot access '/opt/decom.sh': No such file or directory

jiangjian0920 avatar Sep 21 '22 08:09 jiangjian0920

@jiangjian0920 try this one:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=11-jre-slim

FROM openjdk:${java_image_tag} as base

# ARG spark_uid=185

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
    apt-get update && \
    ln -s /lib /lib64 && \
    apt install -y bash tini libc6 libpam-modules krb5-user libnss3 wget && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
    rm -rf /var/cache/apt/*

FROM base as spark
### Download Spark Distribution ###
WORKDIR /opt
RUN wget https://www.apache.org/dyn/closer.lua/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
RUN tar xvf spark-3.3.0-bin-hadoop3.tgz

FROM spark as build
### Create target directories ###
RUN mkdir -p /opt/spark/jars

### Set Spark dir ARG for use Docker build context on root project dir ###
FROM base as final
ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3

### Copy files from the build image ###
COPY --from=build ${spark_dir}/jars /opt/spark/jars
COPY --from=build ${spark_dir}/bin /opt/spark/bin
COPY --from=build ${spark_dir}/sbin /opt/spark/sbin
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/decom.sh /opt/
COPY --from=build ${spark_dir}/examples /opt/spark/examples
COPY --from=build ${spark_dir}/kubernetes/tests /opt/spark/tests
COPY --from=build ${spark_dir}/data /opt/spark/data

WORKDIR /opt/spark/work-dir
ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
# USER ${spark_uid}

Note: You can also use the original one as a base to build your custom image: https://github.com/apache/spark/blob/branch-3.3/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile

josecsotomorales avatar Sep 21 '22 20:09 josecsotomorales

@josecsotomorales thanks for that,I want to build an image that supports version 3.1.2,I'll try again

jiangjian0920 avatar Sep 22 '22 01:09 jiangjian0920

@jiangjian0920 just use that one I provided and change the spark version, you should be good to go 👍🏻

josecsotomorales avatar Sep 22 '22 01:09 josecsotomorales

@josecsotomorales Have you ever tried to change the user of the Spark image? I changed the user of the 2.4.0 version image, and then the following error is reported. Have you ever encountered it?

spark-error.log Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient

jiangjian0920 avatar Oct 17 '22 00:10 jiangjian0920

@shlomi-beinhoren Hi,

I wanna know how to add executor pod name prefix by Spark Operator? https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/api-docs.md#executorspec I didn't find any specification in ExecutorSpec page is related to executor pod name prefix. or did you use saprk native properties? I want to match my driver and executors but get trouble. Thanks!

harryzhang2016 avatar Nov 02 '22 06:11 harryzhang2016

@shlomi-beinhoren Hi,

I wanna know how to add executor pod name prefix by Spark Operator? https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/api-docs.md#executorspec I didn't find any specification in ExecutorSpec page is related to executor pod name prefix. or did you use saprk native properties? I want to match my driver and executors but get trouble. Thanks!

Hi @harryzhang2016 I did it with spark native properties under the sparkConf section described here - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#specifying-spark-configuration Adding the property "spark.kubernetes.executor.podNamePrefix"

shlomi-beinhoren avatar Nov 02 '22 07:11 shlomi-beinhoren

Spark 3.3.0 testing updates:

I built a custom spark 3.3 image and I'm using it with the latest operator image, no issues so far except for the pod name prefix with no more than 47 chars limitation.

josecsotomorales avatar Nov 14 '22 13:11 josecsotomorales

This is the Dockerfile with the custom spark 3.3 image if you folks want to use it:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=17-jre

FROM eclipse-temurin:${java_image_tag} as base

# ARG spark_uid=185

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    apt-get update && \
    ln -s /lib /lib64 && \
    apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
    rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*

FROM base as spark
### Download Spark Distribution ###
WORKDIR /opt
RUN wget https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
RUN tar xvf spark-3.3.0-bin-hadoop3.tgz

FROM spark as build
### Create target directories ###
RUN mkdir -p /opt/spark/jars

### Set Spark dir ARG for use Docker build context on root project dir ###
FROM base as final
ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3

### Copy files from the build image ###
COPY --from=build ${spark_dir}/jars /opt/spark/jars
COPY --from=build /opt/spark/jars /opt/spark/jars
COPY --from=build ${spark_dir}/bin /opt/spark/bin
COPY --from=build ${spark_dir}/sbin /opt/spark/sbin
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/decom.sh /opt/
COPY --from=build ${spark_dir}/examples /opt/spark/examples
COPY --from=build ${spark_dir}/kubernetes/tests /opt/spark/tests
COPY --from=build ${spark_dir}/data /opt/spark/data

WORKDIR /opt/spark/work-dir
ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
# USER ${spark_uid}

josecsotomorales avatar Nov 14 '22 13:11 josecsotomorales

@josecsotomorales What would be the steps to make it work? Just create this image and set this image in the spark application yaml ?

renanxx1 avatar Nov 14 '22 20:11 renanxx1

@renanxx1 that's it :)

josecsotomorales avatar Nov 14 '22 21:11 josecsotomorales

@josecsotomorales I used as base the image gcr.io/spark-operator/spark-py:v3.1.1-hadoop3 and I made the image build and used it in the spark yaml. But when I try to run the spark job I'm getting the error :

from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark'

What could I be missing?

renanxx1 avatar Nov 14 '22 22:11 renanxx1

Any reason that there isn't an official image with updated spark version for over a year now? Is the spark operator properly maintained?

assaf-xm avatar Dec 08 '22 06:12 assaf-xm

Any reason that there isn't an official image with updated spark version for over a year now? Is the spark operator properly maintained?

As far as I perceived it this repo is not actively maintained by the GoogleCloudPlatform team. I pinged them on LinkedIn once to suggest that they might collaborate with or gift it to the Apache SF because the source code also started to diverge from the upstream apache/spark k8s resource manager from what I've seen in the PR I did half a year ago (still no review on this PR).

tafaust avatar Dec 08 '22 07:12 tafaust

I'm considering forking this repo and maintaining it, already asked on the spark operator slack channel what are next steps.

josecsotomorales avatar Dec 08 '22 13:12 josecsotomorales

This is the Dockerfile with the custom spark 3.3 image if you folks want to use it:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=17-jre

FROM eclipse-temurin:${java_image_tag} as base

# ARG spark_uid=185

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    apt-get update && \
    ln -s /lib /lib64 && \
    apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
    rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*

FROM base as spark
### Download Spark Distribution ###
WORKDIR /opt
RUN wget https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
RUN tar xvf spark-3.3.0-bin-hadoop3.tgz

FROM spark as build
### Create target directories ###
RUN mkdir -p /opt/spark/jars

### Set Spark dir ARG for use Docker build context on root project dir ###
FROM base as final
ARG spark_dir=/opt/spark-3.3.0-bin-hadoop3

### Copy files from the build image ###
COPY --from=build ${spark_dir}/jars /opt/spark/jars
COPY --from=build /opt/spark/jars /opt/spark/jars
COPY --from=build ${spark_dir}/bin /opt/spark/bin
COPY --from=build ${spark_dir}/sbin /opt/spark/sbin
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY --from=build ${spark_dir}/kubernetes/dockerfiles/spark/decom.sh /opt/
COPY --from=build ${spark_dir}/examples /opt/spark/examples
COPY --from=build ${spark_dir}/kubernetes/tests /opt/spark/tests
COPY --from=build ${spark_dir}/data /opt/spark/data

WORKDIR /opt/spark/work-dir
ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
# USER ${spark_uid}

sorry to bother, but after building this image and using in spark operator, the pod keeps restaring with the following message:

"[FATAL tini (16)] exec driver-py failed: No such file or directory" with spark-py:v3.0.0 container image

how could i solve this ? changed spark to 3.3.1

matheus-rossi avatar Feb 04 '23 01:02 matheus-rossi

Hey @matheus-rossi , how's it going ? I'm using this one, you can try it.

FROM apache/spark:v3.3.1

USER 185

WORKDIR /

USER 0

RUN apt-get update && apt install -y python3 python3-pip &&  pip3 install --upgrade pip setuptools && rm -r /root/.cache && rm -rf /var/cache/apt/* 

RUN pip3 install pyspark==3.3.1

WORKDIR /opt/spark/work-dir

ENTRYPOINT ["/opt/entrypoint.sh"]

ARG spark_uid=185

USER 185

renanxx1 avatar Feb 04 '23 01:02 renanxx1

Hey @matheus-rossi , how's it going ? I'm using this one, you can try it.

FROM apache/spark:v3.3.1

USER 185

WORKDIR /

USER 0

RUN apt-get update && apt install -y python3 python3-pip &&  pip3 install --upgrade pip setuptools && rm -r /root/.cache && rm -rf /var/cache/apt/* 

RUN pip3 install pyspark==3.3.1

WORKDIR /opt/spark/work-dir

ENTRYPOINT ["/opt/entrypoint.sh"]

ARG spark_uid=185

USER 185

Thanks Renan, that works for me

matheus-rossi avatar Feb 09 '23 16:02 matheus-rossi

@josecsotomorales any news on the fork? It is hilarious that Google can't update Spark to even 3.2 after having working code from contributors and a year after 3.2 release...

j-adamczyk avatar Feb 14 '23 10:02 j-adamczyk

Also looking for an update on this. @josecsotomorales did you decide to fork?

csawtelle avatar May 13 '23 15:05 csawtelle

+1 on this issue

mrendi29 avatar May 17 '23 03:05 mrendi29

+1

andreyolv avatar Oct 22 '23 17:10 andreyolv