CoolplaySpark icon indicating copy to clipboard operation
CoolplaySpark copied to clipboard

关于SparkStreaming的join操作

Open ddc496601562 opened this issue 8 years ago • 2 comments

看到sparkStreaming官网上介绍的join

Here, in each batch interval, the RDD generated by stream1 will be joined with the RDD generated by stream2. You can also do leftOuterJoin, rightOuterJoin, fullOuterJoin. Furthermore, it is often very useful to do joins over windows of the streams. That is pretty easy as well.

具体的实现细节是说这个join只是的那个批次内的多个stream的join,暂时还无法做到跨批次的? 如果sparkstream暂时不能做到跨批次的join,那么若是我们自己做的话,一般的思路是怎样的?

ddc496601562 avatar Dec 21 '16 10:12 ddc496601562

@ddc496601562 一个思路是自己实现自定义的receiver吧,啥时需要数据来做join了才把相应的数据送过去。话说你后来是怎么做的?

AntikaSmith avatar Mar 29 '17 10:03 AntikaSmith

跨批次 你可以放一个窗口出来 窗口里边的就都能join上了 我们是一个小时内join

351zyf avatar Oct 12 '17 07:10 351zyf