raydp icon indicating copy to clipboard operation
raydp copied to clipboard

raydp.init_spark support sparkSession parameter

Open melin opened this issue 5 months ago • 3 comments

The raydp.init_spark method returns sparkSession. Could we first initialize sparkSession and then pass it to the init_spark method?

melin avatar Jul 17 '25 06:07 melin

Hi @melin -- we need to use init_spark to make sure Spark is using Ray as the scheduler. What is your use case?

pang-wu avatar Jul 20 '25 20:07 pang-wu

In the spark-jobserver project, first initialize the sparkSession in the Java code, and then execute the Python script. The Python script use py4j to obtain the sparkSession that was created by the Java code.

I hope to integrate raydp into this project.

java: https://github.com/melin/spark-jobserver/blob/fa9670ae066af85369f77abf261d1dd4a02bf5c4/jobserver-driver/src/main/java/io/github/melin/spark/jobserver/driver/task/SparkPythonTask.java#L53

python: https://github.com/melin/spark-jobserver/blob/fa9670ae066af85369f77abf261d1dd4a02bf5c4/jobserver-driver/src/main/resources/pythonJobTemplate.py#L71

melin avatar Jul 22 '25 07:07 melin

Hi @melin -- we need to use init_spark to make sure Spark is using Ray as the scheduler. What is your use case?

Can ray support it?

melin avatar Oct 23 '25 07:10 melin