Launch dotnet backend jvm bridge process ondemand
We are excited to review your PR.
So we can do the best job, please check:
- [x] There's a descriptive title that will make sense to other developers some time from now.
- [x] There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format
Fixes #nnnnin your description to cause GitHub to automatically close the issue(s) when your PR is merged. - [x] Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
- [x] You have included any necessary tests in the same PR.
ref https://github.com/dotnet/spark/issues/539
The SparkSession builder pattern could detect the backend jvm bridge is running or not, then launch the jvm bridge process on-demand.
In old way, spark-submit launch jvm bridge, the jvm bridge launch dotnet process. But that is inconvenient to debug.
The SparkSession builder pattern doesn't need jvm bridge launched first.
To launch a default SparkSession
var spark = SparkSession.Builder().GetOrCreate();
And dotnet run could do the spark job.
To launch a local SparkSession
var spark = SparkSession.Builder().Master("local").GetOrCreate();
No need to setup those config at spark-submit.
But still support the old method, because we could detect if the jvm bridge is started or not.
Thanks @laneser for the PR. However, I am not fully convinced with this approach:
- This requires another set of documentation: e.g., it will not work if
SPARK_HOMEis not set. - There are objects you can create other than
SparkSessionwhich invokes getting theJvmBridge, e.g.,SparkConf.
Is there anything we can to update the current doc to make the experience better instead of introducing another option?
I am open for further discussion to make the debugging experience seamless. cc: @suhsteve
Thanks for your information!
I will try to setup the dotnet spark jvm bridge through the spark.driver.extraClassPath and let buildSparkSubmitCommand do the same things for dotnet spark.
I think the JVMBridgeHelper works like
pyspark launch gateway method
https://github.com/apache/spark/blob/feeca63198466640ac461a2a34922493fa6162a8/python/pyspark/context.py#L319
https://github.com/apache/spark/blob/95aec091e4d8a45e648ce84d32d912f585eeb151/python/pyspark/java_gateway.py#L40
The SparkConf could not work if jvm bridge does not exists.
I will try to solve the E2E test fail, hope I could figure out why the JVMBridgeHelper does not work with SparkFixture in the unit test.
Also I agreed that the helper should try to find spark home if SPARK_HOME environement is not defined.
Hoping the dotnet spark listed in the https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession , We could have same experience just like other spark tools.
Though the checks have passed, but I have no idea why it was not working before... 😿