spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48960][CONNECT] Makes spark-submit works with Spark connect

Open HyukjinKwon opened this issue 7 months ago • 1 comments

What changes were proposed in this pull request?

This PR proposes to add the support of --remote at bin/spark-submit so it can use Spark Connect easily. This PR inclues:

  • Make bin/spark-submit working with Scala Spark Connect client
  • Pass --conf and loaded configurations to both Scala and Python Spark Connect clients

Why are the changes needed?

bin/pyspark --remote already works. We should also make bin/spark-submit works in order for end users to try Spark Connect out and to have the consistent way.

Does this PR introduce any user-facing change?

Yes,

  • bin/spark-submit supports --remote option in Scala.
  • bin/spark-submit supports --conf and loaded Spark configurations to pass to the clients in Scala and Python

How was this patch tested?

Python:

echo "from pyspark.sql import SparkSession;spark = SparkSession.builder.getOrCreate();assert 'connect' in str(type(spark));assert spark.range(1).first()[0] == 0" > test.py
./bin/spark-submit --name "testApp" --remote "local" test.py

Scala:

https://github.com/HyukjinKwon/spark-connect-example

git clone https://github.com/HyukjinKwon/spark-connect-example
cd spark-connect-example
build/sbt package
cd ..
git clone https://github.com/apache/spark.git
cd spark
build/sbt package
sbin/start-connect-server.sh
bin/spark-submit --name "testApp" --remote "sc://localhost" --class com.hyukjinkwon.SparkConnectExample ../spark-connect-example/target/scala-2.13/spark-connect-example_2.13-0.0.1.jar

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon avatar Jul 22 '24 01:07 HyukjinKwon