chronon icon indicating copy to clipboard operation
chronon copied to clipboard

Create and traverse avro schemas once per task (#296)

Open smcnamara2-stripe opened this issue 5 months ago • 0 comments

Summary

This adds a SchemaTraverser to the chronon row creation logic, which allows callers of Row.to to specify how the output schema should be constructed.

At the core, this code is centered around removing the AvroConversions.fromChrononSchema call in the lambda passed from fromChrononRow.

Why / Goal

The proximate goal is to allow faster Avro Schema construction, avoiding repeated calls to fromChrononSchema in a tight loop. This has resulted in a 90+% reduction in CPU usage during the reduce phase of the GroupByUpload aggregation.

At stripe, several of our largest GroupByUpload apps are now using 50-80% fewer vcore-seconds per run.

Test Plan

  • [ ] Added Unit Tests
  • [x] Covered by existing CI
  • [x] Integration tested
  • [x] Live on Stripe's internal fork.

Checklist

  • [ ] Documentation update

Reviewers

smcnamara2-stripe avatar Sep 19 '24 18:09 smcnamara2-stripe