raydp.init_spark support sparkSession parameter
The raydp.init_spark method returns sparkSession. Could we first initialize sparkSession and then pass it to the init_spark method?
Hi @melin -- we need to use init_spark to make sure Spark is using Ray as the scheduler. What is your use case?
In the spark-jobserver project, first initialize the sparkSession in the Java code, and then execute the Python script. The Python script use py4j to obtain the sparkSession that was created by the Java code.
I hope to integrate raydp into this project.
java: https://github.com/melin/spark-jobserver/blob/fa9670ae066af85369f77abf261d1dd4a02bf5c4/jobserver-driver/src/main/java/io/github/melin/spark/jobserver/driver/task/SparkPythonTask.java#L53
python: https://github.com/melin/spark-jobserver/blob/fa9670ae066af85369f77abf261d1dd4a02bf5c4/jobserver-driver/src/main/resources/pythonJobTemplate.py#L71
Hi @melin -- we need to use init_spark to make sure Spark is using Ray as the scheduler. What is your use case?
Can ray support it?