spark-dynamodb
spark-dynamodb copied to clipboard
Update Spark Version 2.2
What’s the issue you’re seeing trying to use this with Spark 2.2? The Spark dependency is “provided” meaning you should be able to use this jar with Spark 2.2. On Fri, Mar 30, 2018 at 11:07 AM DataWanderer [email protected] wrote:
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/traviscrawford/spark-dynamodb/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG2gqSH_5GoMwD5lzRfsOY-45WduYh1ks5tjnRugaJpZM4TB-Ug .
com.google.common.util.concurrent.RateLimiter.acquire(I)D at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply$mcDI$sp(DynamoDBRelation.scala:137) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply(DynamoDBRelation.scala:136) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply(DynamoDBRelation.scala:136) at scala.Option.foreach(Option.scala:257) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2.apply(DynamoDBRelation.scala:136) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2.apply(DynamoDBRelation.scala:130) at scala.Option.foreach(Option.scala:257) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1.apply(DynamoDBRelation.scala:130) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1.apply(DynamoDBRelation.scala:114) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at
On Fri, Mar 30, 2018 at 6:59 PM, Travis Crawford [email protected] wrote:
What’s the issue you’re seeing trying to use this with Spark 2.2? The Spark dependency is “provided” meaning you should be able to use this jar with Spark 2.2. On Fri, Mar 30, 2018 at 11:07 AM DataWanderer [email protected] wrote:
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/traviscrawford/spark-dynamodb/issues/43, or mute the thread <https://github.com/notifications/unsubscribe- auth/AAG2gqSH_5GoMwD5lzRfsOY-45WduYh1ks5tjnRugaJpZM4TB-Ug> .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/traviscrawford/spark-dynamodb/issues/43#issuecomment-377648802, or mute the thread https://github.com/notifications/unsubscribe-auth/AG94jobiTzZJiFB_wAl3fCqcc8eaUrvSks5tjsbhgaJpZM4TB-Ug .
Looking at https://github.com/traviscrawford/spark-dynamodb/blob/master/pom.xml we see there's no explicit Guava dependency, so whatever version Spark brings is what's used.
Does your application override the Guava version, either explicitly or transitively from some other dependency? What version of Guava is on your application's classpath, and what's pulling it in?
The application doesnot override Guava Version. But some how EMR spark versino 2.2.1 and 2.2.0 gives this issue
It looks like it's not limited to EMR. The same issue appears on DataBricks.
Here is my code:
import com.github.traviscrawford.spark.dynamodb.DynamoScanner
val rdd = DynamoScanner(sc, "table", 8, 1000, None, Option(50), Option("us-east-1"))
rdd.collect()
And the output:
Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.RateLimiter.acquire(I)D
at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply$mcDI$sp(DynamoScanner.scala:67)
at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
at scala.Option.foreach(Option.scala:257)
at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:66)
at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:60)
at scala.Option.foreach(Option.scala:257)
at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:60)
at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:58)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:111)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:354)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I'm running it on a databricks cluster using their 3.5 LTS (includes Apache Spark 2.2.1, Scala 2.11).
The only dependency attached to the cluster is spark-dynamodb-0.0.13
.
Can I ask on what kind of cluster are you running those spark jobs @traviscrawford ? EMR, Databricks or self-maintained cluster?
EDIT: It's worth noting that this only occurs when you put a rate limit that isn't None.
Here is how I fixed it.
I added spark-dynamodb-0.0.13
to my application and I also added guava:16
. I then shadowed guava:16
to not have issues with DataBricks provided version. This issue occurs because the acquire method used to return void before Guava 16 and it returns Double since Guava 16.
I think there is something funky with the Scala compilation. Indeed, this stack means that the library code is trying to call an acquire method that takes an Int as an argument and return a Double. Which is not possible if you're using a stock version of Spark (Guava is stuck at 15 because of Hadoop if I'm not mistaken). Hence, shadowing a Guava 16 solves the issue