chronon
chronon copied to clipboard
[aggregation] Two Stack + Hops + Union Sort + Hybrid
Summary
Implement two stack lite + hop aggregate + union sort for temporal events, with the option to turn it on for testing.
Unlike the implementation purposed by stripe (cogroupSorted - which is only available after spark 3.4), the solution uses repartitionAndSortWithinPartitions
and RDD to enable external sort with RDDs.
Previous bench marking can be found: https://github.com/airbnb/chronon/pull/485 https://github.com/airbnb/chronon/pull/464
Detailed analysis: https://docs.google.com/document/d/1hlG69A9ih4SJBToTBoGDMmlbRtMmpNoKbFMu-tFUH6w/edit
Why / Goal
We want to test the performance of the aforementioned solution and potentially use it in production.
Test Plan
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
Checklist
- [ ] Documentation update
Reviewers
@nikhilsimha @hzding621 @vamseeyarla
cc @camweston-stripe