SparkInternals icon indicating copy to clipboard operation
SparkInternals copied to clipboard

Narrow dependencies-第二章第二节图FullDependency: N : N

Open feitang0 opened this issue 9 years ago • 5 comments

Narrow dependencies: each partition of the parent RDD is used by at most one partition of the child RDD 第二章第二节中 FullDependency: N : N 那张图, 父RDD中的一个分区被子RDD的两个分区依赖, 不能被称为Narrow Dependency吧, 为啥说FullDenpency是NarrowDepency呢?

feitang0 avatar Nov 06 '15 09:11 feitang0

+1

zzl0 avatar Nov 18 '15 16:11 zzl0

+1

pzz2011 avatar Mar 29 '16 03:03 pzz2011

+1

wojiaohgl avatar Apr 18 '16 08:04 wojiaohgl

Narrow指的是完全依赖,parentRDD中每个p中的数据不需要再进行partition后发给childRDD。下面的cartesian(otherRDD)展示了N:N的Narrow Dependency,整个计算过程不需要shuffle。

JerryLead avatar May 02 '16 13:05 JerryLead

@JerryLead 个人觉得这里narrow vs. wide定义不是很清楚,感觉作者原意是想把确定的和随机的分开,所以如果中间有shuffle操作则为wide,否则为narrow。其实际的含义是确定和不确定的区别(即给定一个子partition,其父partition是完全确定的),而不是full还是partial。尤其,"essentially" 建议改为"typically",否则意思上也是有自相矛盾的地方。

liqul avatar Feb 27 '17 11:02 liqul