containers icon indicating copy to clipboard operation
containers copied to clipboard

[bitnami/spark] generate bitnami/spark with python 3.10.x

Open amenezes opened this issue 3 years ago • 5 comments

Name and Version

bitnami/spark:3.2.2

What is the problem this feature will solve?

Added support to use python 3.10.x - pyspark driver 3.10.

What is the feature you are proposing to solve the problem?

Without python 3.10.x the following error is raised: "Python in worker has different version 3.8 than that in driver 3.10".

What alternatives have you considered?

First, build my own docker image.

Or, generate a new bitname/spark support to python 3.10.x. For this it's just necessary add the following line in the Dockerfile:

# current version 3.2
RUN . /opt/bitnami/scripts/libcomponent.sh && component_unpack "python" "3.8.13-166" --checksum 9a5fba755f6c8d60eacc80f366f3fbaa57d003913e48c31ba337037bb69e37b3

# proposal for the new tag
RUN . /opt/bitnami/scripts/libcomponent.sh && component_unpack "python" "3.10.6-7" --checksum 02e5a66908664141ad80a1d40b390a71e8cec13771cb38600c322d36747fb298

amenezes avatar Aug 29 '22 19:08 amenezes

Hi,

Without python 3.10.x the following error is raised: "Python in worker has different version 3.8 than that in driver 3.10".

Could you specify the steps that make this message appear?

javsalgar avatar Aug 31 '22 08:08 javsalgar

I'm facing the same issue when spark-submitt'ing a job from a recent python 3.10 environment (developer's laptop) to spark running inside this container

chris-aeviator avatar Sep 03 '22 23:09 chris-aeviator

Could you provide an environment for us to consistently reproducing the issue?

javsalgar avatar Sep 05 '22 08:09 javsalgar

Could you provide an environment for us to consistently reproducing the issue?

conda create -n spark310 python=3.10 && conda install pyspark && spark-submit …

EDIT: I'm using spark-submit from https://www.apache.org/dyn/closer.lua/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz instead of the conda installed spark-submit

my complete command is

./bin/spark-submit --master spark://host-with-bitnami-spark:7077 --conf "spark.driver.extraJavaOptions=--add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED" ~/Dev/my-project/my-spark-job.py

chris-aeviator avatar Sep 05 '22 10:09 chris-aeviator

Ok,

I will create a task for updating the python version to 3.10. I cannot guarantee an ETA but as soon as there are news, I will update the ticket

javsalgar avatar Sep 06 '22 08:09 javsalgar

@javsalgar any news on this? I also get this error

j-adamczyk avatar May 18 '23 18:05 j-adamczyk

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] avatar Jun 03 '23 01:06 github-actions[bot]

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

github-actions[bot] avatar Jun 08 '23 01:06 github-actions[bot]

Request to reopen, Python 3.8 is really outdated now, and building a 3.10 image ourselves is a considerable operational burden

j-adamczyk avatar Jun 08 '23 07:06 j-adamczyk

Please note at this moment the Python version used in the Bitnami Spark container is 3.9, no 3.8.

About using 3.10, we found some issues in some of the distros supported as part of the VMware Application Catalog (Debian 11, CentOS 7, PhotonOS 3 & 4, Ubuntu 18.04, 20.04 & 22.04, RedHat UBI 8 & 9). We'll review if it's possible to bump the version but note none of the Python versions (3.8, 3.9 or 3.10) are close to reaching the EOL.

carrodher avatar Jun 08 '23 14:06 carrodher