chronon icon indicating copy to clipboard operation
chronon copied to clipboard

[aggregation] Two Stack + Hops + Union Sort + Hybrid

Open cenhao opened this issue 1 year ago • 0 comments

Summary

Implement two stack lite + hop aggregate + union sort for temporal events, with the option to turn it on for testing. Unlike the implementation purposed by stripe (cogroupSorted - which is only available after spark 3.4), the solution uses repartitionAndSortWithinPartitions and RDD to enable external sort with RDDs.

Previous bench marking can be found: https://github.com/airbnb/chronon/pull/485 https://github.com/airbnb/chronon/pull/464

Detailed analysis: https://docs.google.com/document/d/1hlG69A9ih4SJBToTBoGDMmlbRtMmpNoKbFMu-tFUH6w/edit

Why / Goal

We want to test the performance of the aforementioned solution and potentially use it in production.

Test Plan

  • [x] Added Unit Tests
  • [ ] Covered by existing CI
  • [ ] Integration tested

Checklist

  • [ ] Documentation update

Reviewers

@nikhilsimha @hzding621 @vamseeyarla

cc @camweston-stripe

cenhao avatar Jul 06 '23 16:07 cenhao