Spark-The-Definitive-Guide icon indicating copy to clipboard operation
Spark-The-Definitive-Guide copied to clipboard

Chapter 13 - Advanced RDD example of Custom partitioner may need correction

Open izayarniy opened this issue 6 years ago • 1 comments

I'm studying spark advanced RDD API and got a little bit confused by one example. `// in Scala import org.apache.spark.Partitioner

class DomainPartitioner extends Partitioner { def numPartitions = 3 def getPartition(key: Any): Int = { val customerId = key.asInstanceOf[Double].toInt if (customerId == 17850.0 || customerId == 12583.0) { return 0 } else { return new java.util.Random().nextInt(2) + 1 } } }` As far as I can see in code documentation, partitioner must return the same partition id given the same partition key. That is not true for the example in the code above. Isn't "random" id for key break the Partitioner interface ?

izayarniy avatar Jun 23 '19 23:06 izayarniy

Hi there ,

Java.util.random.nextInt(2) return a number between 0 and 1 not inclusive of 2. So assume the idea is the code is interested in the given customer ID of the first if block and the rest of the customerid data will be mapped to partition 1 and 2

subhmita avatar Sep 26 '19 14:09 subhmita