deploy spark in a server and connect to it locally
Name and Version
docker.io/bitnami/spark:3.5
What architecture are you using?
None
What steps will reproduce the bug?
Hi, I have deployed the docker compose in a Ubuntu VM:
services:
spark:
image: docker.io/bitnami/spark:3.5
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
ports:
- '8080:8080'
- '7077:7077'
spark-worker:
image: docker.io/bitnami/spark:3.5
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://<ubuntu ip>:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
I tested and it is working correctly on the server, but it is not working if I try to connect from my local computer
It is possible to connect through port 7077
nc -zv
What do you see instead?
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("SparkTest") \
.master("spark://<ubuntu ip>:7077") \
.getOrCreate()
I receive the error:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/sql/session.py", line 497, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 515, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 203, in __init__
self._do_init(
File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 296, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 421, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/etru0005/anaconda3/lib/python3.11/site-packages/py4j/java_gateway.py", line 1587, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/Users/etru/anaconda3/lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.UnsupportedOperationException: getSubject is supported only if a security manager is allowed
at java.base/javax.security.auth.Subject.getSubject(Subject.java:347)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:577)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2416)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2416)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:329)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:501)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:485)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1575)
Hi, the issue may not be directly related to the Bitnami container image/Helm chart, but rather to how the application is being utilized, configured in your specific environment, or tied to a particular scenario that is not easy to reproduce on our side.
If you think that's not the case and want to contribute a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.
Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.
Suppose you have any questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.
With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.
I am seeing exactly this error, with a similarly minimal example. docker-compose.yml (only change: port 7077 exposed from spark master) + images from bitnami, and in my case connecting to port 7077 on localhost for local development:
spark = SparkSession.builder.appName("HelloWorld").master("spark://localhost:7077").getOrCreate()
Note: "how the application is being utilized" -- the only use here is the single-line invocation above which causes the exception, the rest is bitnami images and docker-compose. Is there anything else we can try?
I am using docker desktop 4.34.2 (167172) on macOS 15.0.1 on an M1 Pro, pulled images are arm.
Looks like this could be because getSubject() was deprecated for removal in JDK 17, but the hadoop packaged with spark still makes use of this API.
See https://issues.apache.org/jira/browse/HADOOP-19212 and https://issues.apache.org/jira/browse/CALCITE-6590 and https://openjdk.org/jeps/411
I can see the bitnami image has jdk 17, so I'm not sure why it's raising the getSubject / security manager exception when it's supposed to only warn in that version of the jdk.
As a work-around, I've tried to pass -Djava.security.manager=allow in through various environment variables and even the SparkSession builder config, but to no avail.
I would appreciate any tips here.
The problems was the JDK on my client. When I downgraded from 23 to 21, the security manager issue turned into a warning and not an exception, as expected.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
The problems was the JDK on my client. When I downgraded from 23 to 21, the security manager issue turned into a warning and not an exception, as expected.
This helped me, thanks!