Ammonite
                                
                                 Ammonite copied to clipboard
                                
                                    Ammonite copied to clipboard
                            
                            
                            
                        spark hive support
adapting the spark2 example found in the repository to include spark-hive I unfortunately run into the following error
import $ivy.{
  `org.apache.spark::spark-core:2.3.0`,
  `org.apache.spark::spark-sql:2.3.0`,
  `org.apache.spark::spark-hive:2.3.0`
}
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").appName("test").enableHiveSupport.getOrCreate
java.lang.NumberFormatException: For input string: "${pom"
  java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
  java.lang.Integer.parseInt(Integer.java:569)
  java.lang.Integer.parseInt(Integer.java:615)
  org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:168)
  org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
  org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
  org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
  org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
  java.lang.Class.forName0(Native Method)
  java.lang.Class.forName(Class.java:348)
  org.apache.spark.util.Utils$.classForName(Utils.scala:235)
  org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1074)
  org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:862)
  ammonite.$sess.cmd3$.<init>(cmd3.sc:1)
  ammonite.$sess.cmd3$.<clinit>(cmd3.sc)
note that nothing was executed, but the error appears when instantiating Spark with Hive support. (Same error appears for spark version 2.1.0)
versions: Ammonite Repl 1.0.5 (Scala 2.11.12 Java 1.8.0_161) CentOS 7
Just to chime in, I've also have had no luck getting Hive to work with Ammonite as well.
spark-hive now works fine with ammonite-spark for me.
@alexarchambault i am using ammonite-spark, but still hit the same issue as @schlichtanders, this is on AWS EMR cluster spark-shell works fine.
{import org.apache.spark.sql._
            val spark = AmmoniteSparkSession.builder
                .progressBars()
                .master("yarn")
                .config("spark.logConf", "true")
                .config("spark.executor.instances", "4")
                .config("spark.executor.memory", "2g")
                .enableHiveSupport()
                .getOrCreate()
                }
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
java.lang.NumberFormatException: For input string: "${pom"
  java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
  java.lang.Integer.parseInt(Integer.java:569)
  java.lang.Integer.parseInt(Integer.java:615)
  org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:168)
  org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
  org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
  org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
  org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
  java.lang.Class.forName0(Native Method)
  java.lang.Class.forName(Class.java:348)
  org.apache.spark.util.Utils$.classForName(Utils.scala:238)
  org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1078)
  org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:865)
  ammonite.$sess.cmd0$.<init>(cmd0.sc:8)
  ammonite.$sess.cmd0$.<clinit>(cmd0.sc)
it's like a hadoop problem.
hadoop-amm@ import org.apache.hadoop.util.VersionInfo
import org.apache.hadoop.util.VersionInfo
hadoop-amm@ var ver = VersionInfo.getVersion()
ver: String = "${pom.version}"
ls.rec! Path("/usr/lib/spark/jars") |? { _.segments.last.endsWith(".jar") } |! { interp.load.cp(_) }
import org.apache.hadoop.util.VersionInfo
val ver = VersionInfo.getVersion()
ver: String = "2.8.3-amzn-1"
but there are still other problems...
I was able to reproduce that on EMR… That seems to originate from Ammonite adding both main JARs and source JARs to the classpath. This results in two common-version-info.properties resources landing in the classpath, from the main and source JARs of org.apache.hadoop:hadoop-common:2.6.5. The one from the source seems to be a kind of template, hence the "${pom stuff.
For w/e reasons, I ran into that at $work, but must have changed stuff in the setup there, so that the right common-version-info.properties gets picked by chance…
Ideally, Ammonite shouldn't blindly add source JARs to the classpath this way…
As a quick workaround though, I guess adding source JARs this way could be put behind a flag, so that it could be disabled if necessary.
Is there any hope for this on EMR?  What about import $ivy. everything?  I started down this path and kept trading one exception for another, then thought maybe I was wasting my time and someone had tried all of this already. Last one was:java.lang.NoSuchMethodError: org.apache.hadoop.io.retry.RetryUtils.getDefaultRetryPolicy.
How can I help?
i actually made it work in our emr cluster. instead of a gist, i just publish my setup and hope it can be a bit help to anybody who are interested.
https://github.com/dyno/ammonite_with_spark_on_emr
@alexarchambault
In ammonite/runtime/tools/IvyThing.resolveArtifact is there a reason you included .addClassifiers(Classifier.sources)?
Would removal of this line fix the problem?
Also been hit by this. My workaround is to replace the offending -sources.zip with an empty zip file:
touch empty                                                                                                                                                        
zip empty.zip empty                                                                                                                                                
find $HOME/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/hadoop/ -iname "*-sources.jar" -exec  cp empty.zip "{}" \;                
Needs to be re-run if new sources are downloaded to the cache, so not perfect but very easy to do.
On second thought removing that line might be a bad idea... Who knows what sort of side-effects it'll have, plus it doesn't appear there's a way to include a sources specifier in import $ivy lines.
I'm turning my attention to amm/interp/src/main/scala/ammonite/interp/Interpreter.scala. It appears in there that all of the artifacts fetched by coursier are blindly added to the classpath.
Somewhere in there source jars should be excluded from the cp.
FYI, with the current nightlies (and in the upcoming releases), it's possible to disable import $ivy bringing sources, with code like this one:
interp.resolutionHooks += { fetch =>
  import scala.collection.JavaConverters._
  fetch.withClassifiers(fetch.getClassifiers.asScala.filter(_ != "sources").asJava)
}
(needs to be run in a cell prior to the one doing import $ivy).
Thanks for this workaround :) and for everybody, don't forget to add this section before and added @ line to change ammonite resolution behavior after the @ :
interp.resolutionHooks += { fetch =>
  // -- This is mandatory with drools >= 7.0.46 because drools sources artifacts also brings kie.conf (generate resources conflict)
  // -- and because by default ammonite also load sources artifact
  import scala.jdk.CollectionConverters._
  fetch.withClassifiers(fetch.getClassifiers.asScala.filter(_ != "sources").asJava)
}
@
import $ivy.`fr.janalyse::drools-scripting:1.0.14-SNAPSHOT`, $ivy.`org.scalatest::scalatest:3.2.3`
@lihaoyi it would be greate to have an alternative syntax to exclude more easily sources resolutions