data-engineering-zoomcamp
data-engineering-zoomcamp copied to clipboard
Timecodes for "DE Zoomcamp 5.4.3 - Joins in Spark"
Youtube video: https://www.youtube.com/watch?v=lu7TrqAWuH4
0:00:00 - Spark internals, group by, reshuffling, joints 0:01:55 - Join: green and yellow, outer join 0:03:39 - Joining yellow and green datasets 0:05:24 - Complex record creation and reshuffling 0:07:22 - Reshuffling for join using merge sort 0:09:21 - Materializing results for efficient processing 0:11:12 - Joining large tables, small tables 0:13:00 - DataFrame join, drop, save, execution plan 0:14:47 - Small zones, broadcast join, fast
Updated, thanks!