spark
spark copied to clipboard
[SPARK-48960][CONNECT] Makes spark-submit works with Spark connect
What changes were proposed in this pull request?
This PR proposes to add the support of --remote
at bin/spark-submit
so it can use Spark Connect easily. This PR inclues:
- Make
bin/spark-submit
working with Scala Spark Connect client - Pass
--conf
and loaded configurations to both Scala and Python Spark Connect clients
Why are the changes needed?
bin/pyspark --remote
already works. We should also make bin/spark-submit
works in order for end users to try Spark Connect out and to have the consistent way.
Does this PR introduce any user-facing change?
Yes,
-
bin/spark-submit
supports--remote
option in Scala. -
bin/spark-submit
supports--conf
and loaded Spark configurations to pass to the clients in Scala and Python
How was this patch tested?
Python:
echo "from pyspark.sql import SparkSession;spark = SparkSession.builder.getOrCreate();assert 'connect' in str(type(spark));assert spark.range(1).first()[0] == 0" > test.py
./bin/spark-submit --name "testApp" --remote "local" test.py
Scala:
https://github.com/HyukjinKwon/spark-connect-example
git clone https://github.com/HyukjinKwon/spark-connect-example
cd spark-connect-example
build/sbt package
cd ..
git clone https://github.com/apache/spark.git
cd spark
build/sbt package
sbin/start-connect-server.sh
bin/spark-submit --name "testApp" --remote "sc://localhost" --class com.hyukjinkwon.SparkConnectExample ../spark-connect-example/target/scala-2.13/spark-connect-example_2.13-0.0.1.jar
Was this patch authored or co-authored using generative AI tooling?
No.