SparkInternals icon indicating copy to clipboard operation
SparkInternals copied to clipboard

关于partitioner的疑问

Open leo-987 opened this issue 8 years ago • 0 comments

我在 Learning Spark 中看到有一段话:

Finally, for binary operations, which partitioner is set on the output depends on the parent RDDs’ partitioners. By default, it is a hash partitioner, with the number of partitions set to the level of parallelism of the operation. However, if one of the parents has a partitioner set, it will be that partitioner; and if both parents have a partitioner set, it will be the partitioner of the first parent.

子RDD的partitioner应该由父RDD的partitioner决定。但在 SparkInternals 的第二章,父子RDD的partitioner都不相同,这是怎么回事?如果两个父RDD的其中一个是hash-partitioner,那么子RDD不应该也是hash-partitioner吗?

leo-987 avatar Jun 23 '16 01:06 leo-987