spark-dynamodb icon indicating copy to clipboard operation
spark-dynamodb copied to clipboard

Update Spark Version 2.2

Open DataWanderer opened this issue 6 years ago • 6 comments

DataWanderer avatar Mar 30 '18 18:03 DataWanderer

What’s the issue you’re seeing trying to use this with Spark 2.2? The Spark dependency is “provided” meaning you should be able to use this jar with Spark 2.2. On Fri, Mar 30, 2018 at 11:07 AM DataWanderer [email protected] wrote:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/traviscrawford/spark-dynamodb/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG2gqSH_5GoMwD5lzRfsOY-45WduYh1ks5tjnRugaJpZM4TB-Ug .

traviscrawford avatar Mar 30 '18 23:03 traviscrawford

com.google.common.util.concurrent.RateLimiter.acquire(I)D at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply$mcDI$sp(DynamoDBRelation.scala:137) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply(DynamoDBRelation.scala:136) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply(DynamoDBRelation.scala:136) at scala.Option.foreach(Option.scala:257) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2.apply(DynamoDBRelation.scala:136) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2.apply(DynamoDBRelation.scala:130) at scala.Option.foreach(Option.scala:257) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1.apply(DynamoDBRelation.scala:130) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1.apply(DynamoDBRelation.scala:114) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at

On Fri, Mar 30, 2018 at 6:59 PM, Travis Crawford [email protected] wrote:

What’s the issue you’re seeing trying to use this with Spark 2.2? The Spark dependency is “provided” meaning you should be able to use this jar with Spark 2.2. On Fri, Mar 30, 2018 at 11:07 AM DataWanderer [email protected] wrote:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/traviscrawford/spark-dynamodb/issues/43, or mute the thread <https://github.com/notifications/unsubscribe- auth/AAG2gqSH_5GoMwD5lzRfsOY-45WduYh1ks5tjnRugaJpZM4TB-Ug> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/traviscrawford/spark-dynamodb/issues/43#issuecomment-377648802, or mute the thread https://github.com/notifications/unsubscribe-auth/AG94jobiTzZJiFB_wAl3fCqcc8eaUrvSks5tjsbhgaJpZM4TB-Ug .

DataWanderer avatar Mar 31 '18 03:03 DataWanderer

Looking at https://github.com/traviscrawford/spark-dynamodb/blob/master/pom.xml we see there's no explicit Guava dependency, so whatever version Spark brings is what's used.

Does your application override the Guava version, either explicitly or transitively from some other dependency? What version of Guava is on your application's classpath, and what's pulling it in?

traviscrawford avatar Mar 31 '18 20:03 traviscrawford

The application doesnot override Guava Version. But some how EMR spark versino 2.2.1 and 2.2.0 gives this issue

DataWanderer avatar Apr 03 '18 23:04 DataWanderer

It looks like it's not limited to EMR. The same issue appears on DataBricks.

Here is my code:

import com.github.traviscrawford.spark.dynamodb.DynamoScanner
val rdd = DynamoScanner(sc, "table", 8, 1000, None, Option(50), Option("us-east-1"))
rdd.collect()

And the output:

Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.RateLimiter.acquire(I)D
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply$mcDI$sp(DynamoScanner.scala:67)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
	at scala.Option.foreach(Option.scala:257)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:66)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:60)
	at scala.Option.foreach(Option.scala:257)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:60)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:58)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:111)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:354)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I'm running it on a databricks cluster using their 3.5 LTS (includes Apache Spark 2.2.1, Scala 2.11). The only dependency attached to the cluster is spark-dynamodb-0.0.13.

Can I ask on what kind of cluster are you running those spark jobs @traviscrawford ? EMR, Databricks or self-maintained cluster?

EDIT: It's worth noting that this only occurs when you put a rate limit that isn't None.

klamouri avatar Oct 16 '18 20:10 klamouri

Here is how I fixed it.

I added spark-dynamodb-0.0.13 to my application and I also added guava:16. I then shadowed guava:16 to not have issues with DataBricks provided version. This issue occurs because the acquire method used to return void before Guava 16 and it returns Double since Guava 16.

I think there is something funky with the Scala compilation. Indeed, this stack means that the library code is trying to call an acquire method that takes an Int as an argument and return a Double. Which is not possible if you're using a stock version of Spark (Guava is stuck at 15 because of Hadoop if I'm not mistaken). Hence, shadowing a Guava 16 solves the issue

klamouri avatar Oct 17 '18 00:10 klamouri