beakerx icon indicating copy to clipboard operation
beakerx copied to clipboard

Declare sparkSession and sparkContext as transient variables

Open kiranchitturi opened this issue 5 years ago • 1 comments

Spark magic %%spark --start does not declare spark context as a transient variable and due to this doing any sort of RDD manipulations will throw the below error

Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
	- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@71836471)
	- field (class: $iw, name: sc, type: class org.apache.spark.SparkContext)

the fix is to declare https://github.com/twosigma/beakerx/blob/d6233eb3baa8a8528e415401985da9e2e2d4ccfe/kernel/sparkex/src/main/java/com/twosigma/beakerx/widget/SparkEngineBase.java#L160 as transient variables

                    "@transient val %s = SparkVariable.getSparkSession()\n" +
                    "@transient val %s = %s.sparkContext\n" +

Please let me know if there is any other way to workaround this in the latest version

kiranchitturi avatar Nov 01 '19 15:11 kiranchitturi

Hi @kiranchitturi Could you provide some simple example (ipynb) that will allow us to reproduce the problem ?

jaroslawmalekcodete avatar Jan 02 '20 14:01 jaroslawmalekcodete