Ammonite icon indicating copy to clipboard operation
Ammonite copied to clipboard

Make it possible to hide SPARK_HOME

Open MarkusRothSBB opened this issue 5 years ago • 0 comments

I am building a Docker image for a Jupyter notebook that should support Spark shells with both PySpark and Almond. PySpark should use the local Spark installation (found with the SPARK_HOME environment variable), while Almond should use its own Spark version downloaded from Coursier.

Ammonite, used by Almond, checks if the SPARK_HOME environment variable is set. If it is, it takes the jars from the local Spark installation. If it is not set, it takes the Spark jars from Coursier.

With the --env install argument for Almond, I can set SPARK_HOME to an empty string, or to an existing path with no files in it. However, Ammonite still assumes that Spark must be locally available, just having no jars. So it fails to load any Spark jars, leading the executors to fail in YARN deploy mode. It is not possible to unset the SPARK_HOME environment variable with the --env command.

I would ask for a check at AmmoniteSparkSessionBuilder.scala line 232 if sys.env.get("SPARK_HOME") is None or empty, rather than if it is None. If it is empty, proceed as if it were None (i.e. not set at all). This would enable the existence of a local Spark installation to be hidden by setting the SPARK_HOME to empty string with the Almond installer (leading to an env-entry in kernel.json).

Let me take this chance to thank you for your work!

MarkusRothSBB avatar Oct 05 '20 07:10 MarkusRothSBB