spark icon indicating copy to clipboard operation
spark copied to clipboard

Launch dotnet backend jvm bridge process ondemand

Open laneser opened this issue 5 years ago • 4 comments

We are excited to review your PR.

So we can do the best job, please check:

  • [x] There's a descriptive title that will make sense to other developers some time from now.
  • [x] There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • [x] Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • [x] You have included any necessary tests in the same PR.

ref https://github.com/dotnet/spark/issues/539

The SparkSession builder pattern could detect the backend jvm bridge is running or not, then launch the jvm bridge process on-demand.

In old way, spark-submit launch jvm bridge, the jvm bridge launch dotnet process. But that is inconvenient to debug.

The SparkSession builder pattern doesn't need jvm bridge launched first.

To launch a default SparkSession

var spark = SparkSession.Builder().GetOrCreate();

And dotnet run could do the spark job.

To launch a local SparkSession

var spark = SparkSession.Builder().Master("local").GetOrCreate();

No need to setup those config at spark-submit.

But still support the old method, because we could detect if the jvm bridge is started or not.

laneser avatar Jun 22 '20 09:06 laneser

Thanks @laneser for the PR. However, I am not fully convinced with this approach:

  • This requires another set of documentation: e.g., it will not work if SPARK_HOME is not set.
  • There are objects you can create other than SparkSession which invokes getting the JvmBridge, e.g., SparkConf.

Is there anything we can to update the current doc to make the experience better instead of introducing another option?

I am open for further discussion to make the debugging experience seamless. cc: @suhsteve

imback82 avatar Jun 22 '20 19:06 imback82

Thanks for your information! I will try to setup the dotnet spark jvm bridge through the spark.driver.extraClassPath and let buildSparkSubmitCommand do the same things for dotnet spark.

laneser avatar Jun 22 '20 21:06 laneser

I think the JVMBridgeHelper works like

pyspark launch gateway method

https://github.com/apache/spark/blob/feeca63198466640ac461a2a34922493fa6162a8/python/pyspark/context.py#L319

https://github.com/apache/spark/blob/95aec091e4d8a45e648ce84d32d912f585eeb151/python/pyspark/java_gateway.py#L40

The SparkConf could not work if jvm bridge does not exists.

I will try to solve the E2E test fail, hope I could figure out why the JVMBridgeHelper does not work with SparkFixture in the unit test.

Also I agreed that the helper should try to find spark home if SPARK_HOME environement is not defined.

Hoping the dotnet spark listed in the https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession , We could have same experience just like other spark tools.

laneser avatar Jun 22 '20 22:06 laneser

Though the checks have passed, but I have no idea why it was not working before... 😿

laneser avatar Jun 23 '20 01:06 laneser