almond icon indicating copy to clipboard operation
almond copied to clipboard

Is it possible to use provided spark?

Open saint1991 opened this issue 6 years ago • 4 comments

Are there any ways to use provided Spark instead of downloading it on a notebook? In my case, install Jupyter on Dataproc where Spark package is provided.

It seems to be possible if SPARK_HOME can be configured.

saint1991 avatar Jul 20 '18 07:07 saint1991

I'd say this is the most common deployment type (i.e. Spark being provided by the container) for businesses.

aishfenton avatar Oct 25 '18 15:10 aishfenton

@aishfenton I agree… Yet this poses a number of challenges.

When running spark calculations from the kernel, it acts as the driver. Its classpath is the one of almond, plus the user-added dependencies. If one relies on a spark distribution, the classpath of the executors corresponds to jars in the spark distribution (plus those passed as spark.jars I think).

That means the classpath on the driver (almond) and the executors (spark distrib) don't necessarily match.

I ran in numerous issues even with (very) minor differences between the driver and executor classpaths (like two versions of the JAR of scala-library landing in the executor classpath, something like 2.11.2 and 2.11.7 IIRC, making List deserialization fail).

In the past, I circumvented that by using a vendored spark version as a Maven dependency from almond (rather than via a spark distribution), and only using spark configuration files from the spark distribution.

Yet @dynofu seems to have successfully used a spark distribution via ammonite-spark. I don't know how far he went though…

alexarchambault avatar Oct 25 '18 16:10 alexarchambault

you can take a look my scripts build on top of Ammonite-spark https://github.com/dyno/ammonite_with_spark_on_emr and the spark.jars will use whatever on the emr by ammonite-spark https://github.com/dyno/ammonite_with_spark_on_emr/blob/master/emr.sc#L33.

dynofu avatar Oct 25 '18 20:10 dynofu

If one were to get a spark distribution working via ammonite-spark, what more would be needed to be able to get the same functionality surfaced within an almond kernel?

mpacer avatar Oct 31 '18 00:10 mpacer