Spark-The-Definitive-Guide
Spark-The-Definitive-Guide copied to clipboard
Chapter 13 - Advanced RDD example of Custom partitioner may need correction
I'm studying spark advanced RDD API and got a little bit confused by one example. `// in Scala import org.apache.spark.Partitioner
class DomainPartitioner extends Partitioner { def numPartitions = 3 def getPartition(key: Any): Int = { val customerId = key.asInstanceOf[Double].toInt if (customerId == 17850.0 || customerId == 12583.0) { return 0 } else { return new java.util.Random().nextInt(2) + 1 } } }` As far as I can see in code documentation, partitioner must return the same partition id given the same partition key. That is not true for the example in the code above. Isn't "random" id for key break the Partitioner interface ?
Hi there ,
Java.util.random.nextInt(2) return a number between 0 and 1 not inclusive of 2. So assume the idea is the code is interested in the given customer ID of the first if block and the rest of the customerid data will be mapped to partition 1 and 2