chronon
chronon copied to clipboard
Create and traverse avro schemas once per task (#296)
Summary
This adds a SchemaTraverser to the chronon row creation logic, which allows callers of Row.to to specify how the output schema should be constructed.
At the core, this code is centered around removing the AvroConversions.fromChrononSchema
call in the lambda passed from fromChrononRow
.
Why / Goal
The proximate goal is to allow faster Avro Schema construction, avoiding repeated calls to fromChrononSchema in a tight loop. This has resulted in a 90+% reduction in CPU usage during the reduce phase of the GroupByUpload aggregation.
At stripe, several of our largest GroupByUpload apps are now using 50-80% fewer vcore-seconds per run.
Test Plan
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [x] Integration tested
- [x] Live on Stripe's internal fork.
Checklist
- [ ] Documentation update